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ABSTRACT 

Within  the  context  of  naval  warfare,  commanders  and  their  staffs  require  access  to 
a  wide  range  of  information  to  carry  out  their  duties.  This  information  provides  them 
with  the  knowledge  necessary  to  determine  the  position,  identity  and  behavior  of  the 
enemy.  This  document  is  concerned  with  the  fusion  of  identity  declarations  through  the 
use  of  statistical  analysis  rooted  in  the  Dempster-Shafer  theory  of  evidence.  It  proposes 
to  hierarchically  structure  the  declarations  according  to  STANAG  4420  (Display 
Symbology  and  Colours  for  NATO  Maritime  Units).  More  specifically  the  aim  of  this 
document  is  twofold:  to  explore  the  problem  of  fusing  identity  declarations  emanating 
from  different  sources,  and  to  offer  the  decision  maker  a  quantitative  analysis  based  on 
statistical  methodology  that  can  enhance  his/her  decision  making  process  regarding  the 
identity  of  detected  objects. 


RESUME 

Dans  un  contexte  de  guerre  navale,  le  commandant  d’un  navire  s’appuie  sur  un 
eventail  imposant  d’information  pour  analyser  la  situation  tactique.  Cette  information 
comporte  notamment  des  renseignements  concemant  la  position,  l’identite  et  le 
comportement  des  objets  detectes.  Ce  document  se  concentre  sur  la  fusion  de 
declarations  d’identite  au  moyen  de  la  theorie  de  1’ evidence  de  Demspter- Shafer.  On  y 
propose  de  structurer  de  fa<?on  hierarchique  les  declarations  d’identite  a  l’aide  de  la  norme 
STANAG  4420  de  l’OTAN.  Une  etude  de  la  probiematique  de  la  fusion  de  declarations 
d’identite  provenant  de  plusieurs  sources  nous  amene  a  offfir  au  commandant  une  analyse 
quantitative  basee  sur  une  methodologie  statistique  pouvant  le/la  seconder  dans  son 
processus  de  prise  de  decision  quant  a  l’identite  des  objets  detectes. 
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EXECUTIVE  SUMMARY 


In  today's  naval  warfare,  commanders  and  their  staffs,  who  are  both  users  and 
active  elements  of  command  and  control  systems,  require  access  to  a  wide  range  of 
information  to  carry  out  their  duties.  In  particular,  their  actions  are  based  on  information 
concerning  the  position,  identity  and  behavior  of  other  vessels  in  their  vicinity.  The 
position  information  determines  where  objects  are,  whereas  the  identity  information 
determines  what  they  are.  Behavioral  information  is  concerned  with  what  the  objects  are 
doing.  In  warfare,  no  one  piece  of  information  can  be  accepted  as  complete  truth.  In 
order  to  lessen  the  damaging  effects  of  poor  quality  evidence,  the  combination  of 
information  from  every  possible  source  is  of  primary  importance.  This  combination 
process  has  often  been  carried  out  manually  but  in  order  to  cope  with  the  ever  increasing 
flow  of  information,  automation  has  surfaced  as  a  possible  option  for  the  fusion  of 
positional  and  identity  information. 

This  document  is  concerned  with  the  use  of  the  Dempster- Shafer  theory  of 
evidence  for  the  fusion  of  identity  declarations  within  a  naval  environment.  It  proposes 
to  hierarchically  structure  the  identity  declarations  according  to  NATO’s  STAN  AG  4420 
charts,  which  provide  a  better  base  for  achieving  interoperability  in  information  exchange 
between  nations  than  uncontrolled  alternatives. 

The  Bayesian  approach  is  also  investigated  but  is  found  to  suffer  from  major 
deficiencies  in  a  hierarchical  context,  when  fully  specified  likelihoods  are  not  available. 
Other  problems  associated  with  this  approach  are  the  coding  of  ignorance  and  the  strict 
requirements  on  the  belief  of  a  hypothesis  and  its  negation. 

One  drawback  of  the  Dempster-Shafer  evidential  theory  is  the  long  calculation 
time  required  by  its  high  computational  complexity.  Due  to  the  hierarchical  nature  of  the 
evidence,  an  algorithm  proposed  by  Shafer  &  Logan  is  implemented  which  reduces  the 
calculations  from  exponential  to  linear  time  proportional  to  the  number  of  nodes  in  the 
tree.  A  semi-automated  decision  making  technique,  based  on  belief  and  plausibility 
values,  is  then  described  to  select  alternatives  which  best  support  the  combined  identity 
declarations.  The  final  decision  will  be  taken  by  the  decision  maker,  because  he/she 
remains  an  important  part  of  the  process  and  because  the  choice  of  the  final  identity  is 
typically  scenario  and  mission  dependent. 

This  document  has  only  begun  to  investigate  the  use  of  the  Dempster-Shafer 
approach  in  the  naval  environment.  In  fact,  the  various  concepts  studied  could  be 
applicable  to  the  domain  of  wide  area  fusion  within  the  framework  of  a  Communications, 
Command,  Control  and  Intelligence  (C3I)  system. 


■ 
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1.0  INTRODUCTION 

In  today's  naval  warfare,  commanders  and  their  staffs,  who  are  both  users  and 
active  elements  of  command  and  control  systems,  require  access  to  a  wide  range  of 
information  to  carry  out  their  duties.  In  particular,  their  actions  are  based  on  information 
concerning  the  position,  identity  and  behavior  of  other  vessels  in  their  vicinity  (Wilson, 
Ref.  1).  The  position  information  determines  where  objects  are,  whereas  the  identity 
information  determines  what  they  are.  Behavioral  information  is  concerned  with  what 
the  objects  are  doing. 

In  warfare,  no  one  piece  of  information  can  be  accepted  as  complete  truth.  In 
order  to  lessen  the  damaging  effects  of  poor  quality  evidence,  the  combination  of 
information  from  every  possible  source  is  of  primary  importance.  This  combination 
process  has  often  been  carried  out  manually  but,  in  order  to  cope  with  the  ever  increasing 
flow  of  information,  automation  has  surfaced  as  a  possible  option  for  the  fusion  of 
positional  and  identity  information. 

This  document  is  concerned  with  the  investigation  of  automated  identification 
techniques  through  the  use  of  statistical  analysis  rooted  in  the  Dempster-Shafer  theory  of 
evidence.  More  specifically,  the  aim  of  this  document  is  twofold:  to  explore  the  problem 
of  fusing  identity  information  emanating  from  different  sources,  and  to  offer  the  decision 
maker  a  quantitative  analysis  based  on  statistical  methodology  that  can  enhance  his/her 
decision  making  process  regarding  the  identity  of  detected  objects. 

Chapter  2  describes  the  problem  facing  naval  commanders  and  gives  a  brief 
survey  of  current  identity  information  sources  available  on  a  Canadian  Patrol  Frigate  type 
ship  for  above  water  warfare.  Potential  future  identity  information  sources  are  also 
mentioned.  Three  levels  of  information  fusion  architectures  are  also  discussed  which 
correspond  to  the  three  following  categories  of  identity  information:  sensor  signals, 
attribute  information  and  identity  declaration.  As  focus  is  brought  on  the  fusion  of 
identity  declarations,  it  is  suggested  that  NATO’s  STANAG  4420  (STAndard  NATO 
AGreement)  charts,  which  were  designed  for  representing  maritime  tactical  information, 
would  be  an  appropriate  tool  for  a  hierarchical  structuring  of  identity  declarations. 
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Fusion  approaches  are  then  suggested  based  on  the  premise  that  identity  declarations  are 
probabilistic  in  nature,  so  that  each  declaration  is  characterized  by  a  confidence  value. 

Chapters  3  and  4  describe  two  approaches  capable  of  fusing  uncertain 
information:  the  Bayesian  paradigm  of  probability  theory  and  the  Dempster-Shafer 
evidential  theory,  respectively.  These  approaches  are  described  in  terms  of  standard  and 
hierarchically  structured  information,  and  examples  are  offered.  Two  techniques  for 
combining  hierarchical  information  are  detailed:  the  first  is  due  to  Pearl  (Ref.  2)  and  the 
second  developed  by  Shafer  &  Logan  (Ref.  3).  Advantages  and  disadvantages  of  each 
approach  are  discussed. 

Chapter  5  briefly  discusses  decision  making  techniques  based  on  the  Dempster- 
Shafer  representation  and  pertaining  to  information  structured  in  a  hierarchical  manner,  in 
order  to  provide  a  decision  making  approach  to  the  identity  declaration  fusion  problem. 

Chapter  6  proposes  an  identity  declaration  fusion  function  based  on  the  findings 
of  Chapters  2  to  5.  A  complete  example  is  provided;  it  details  the  inputs,  the  fusion 
results  and  decision  making  alternatives. 

The  research  and  development  activities  described  in  this  document  were 
performed  at  DREY  between  1993  and  1995  under  PSC12C. 
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2.0  PROBLEM  DESCRIPTION 

In  today's  naval  warfare,  commanders  and  their  staffs,  who  are  both  users  and 
active  elements  of  command  and  control  systems,  require  access  to  a  wide  range  of 
information  to  carry  out  their  duties.  In  particular,  their  actions  are  based  on  information 
concerning  the  position,  identity  and  behavior  of  other  vessels  in  their  vicinity  (Wilson, 
Ref.  1). 


The  prime  parameter  is  the  position  of  objects  surrounding  a  ship  because  identity 
and  behavior  mean  little  unless  they  can  be  associated  with  position.  The  position 
information  determines  where  objects  are,  whereas  the  identity  information  determines 
what  they  are.  The  third  type  of  information  is  behavior,  that  is,  finding  out  what  the 
objects  are  doing  in  order  to  assess  the  potential  threat.  Deductive  reasoning  plays  a  key 
role  in  determining  behavioral  information  (Wilson,  Ref.  1). 

In  warfare,  no  one  piece  of  information  can  be  accepted  as  complete  truth.  In 
order  to  lessen  the  damaging  effects  of  poor  quality  evidence,  the  combination  of 
information  from  all  available  sources  is  of  primary  importance.  This  combination 
process  has  often  been  carried  out  manually  but,  in  order  to  cope  with  the  ever  increasing 
flow  of  information,  automation  has  surfaced  as  a  possible  option.  This  is  particularly 
true  for  the  fusion  of  positional  information,  but  the  same  approach  could  also  be 
considered  for  identity  information. 

Waltz  &  Llinas  (Ref.  4)  state  that  identity  estimation  is  a  much  broader  problem 
than  positional  estimation  because  identity  is  a  much  broader  concept  than  position, 
involving  a  larger  number  of  variables.  Thus  to  better  understand  the  identity  fusion 
problem,  one  needs  to  look  at  these  variables  in  terms  of  the  origins  and  types  of  identity 
information.  The  following  section  gives  examples  of  sources  producing  identity 
information. 

2.1  Information  Sources 

In  a  maritime  environment,  various  surveillance  systems,  electronic  intelligence  and 
human  observations  are  examples  of  information  sources  available  to  the  commander. 
Two  types  of  source  can  be  distinguished:  organic  and  non-organic  sources.  When  the 
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tactical  picture  is  formed  from  data  gathered  by  sources  under  the  jurisdiction  of  the 
commander,  these  sources  are  called  organic.  However,,  additional  information  is 
sometimes  supplied  by  sources  outside  the  jurisdiction  of  the  commander;  these  are 
referred  to  as  non-organic  sources  (Gibson,  Ref.  5).  Output  from  these  sources  is 
partitioned  according  to  the  type  of  information  they  provide;  output  data  may  be 
characterized  as  either  positional  or  identity  information. 

Positional  information  represents  the  dynamic  parameters  describing  the 
movement  associated  with  an  object  (contact).  This  generally  includes  position,  velocity 
and  acceleration.  Identity  information  can  be  defined  as  declarations,  propositions  or 
statements  that  contribute  to  establish  the  identity  of  an  object  (Refs.  6-7).  Equivalently, 
identity  information  may  be  seen  as  information  from  various  sources  that  helps  in 
distinguishing  one  object  from  another.  Possible  values  for  identity  information  can  span 
the  range  from  sensor  signals,  to  attributes,  to  identity  declarations,  as  depicted  in  Fig.  1 . 
The  sensor  signals  represent  some  characteristics  of  the  energy  sensed.  Attributes  such  as 
size,  shape,  degree  of  symmetry,  emitter  type,  etc.  are  inferred  from  these  characteristics. 
Identity  declarations  specify  the  detected  object;  in  the  Canadian  Navy,  for  example,  they 
can  consist  of  a  general  classification  of  which  the  observed  object  is  a  member  (surface 
combatant),  a  specific  type  of  ship  (frigate),  a  specific  class  (City  Class)  or  a  unique 
identity  (Ville  de  Quebec).  Therefore  surface  combatant,  frigate,  City  Class  and  Ville  de 
Qudbec  are  all  examples  of  identity  declarations.  Identity  declarations  can  also  include 
information  concerning  the  threat  designation  of  an  object:  pending,  unknown,  assumed 
friend,  suspect,  friend,  neutral  or  hostile.  It  is  noteworthy  that  for  some  authors  such  as 
Filippidis  &  Schapel  (Ref.  8),  the  term  identity  refers  only  to  the  threat  designation  of  an 
object.  Within  the  context  of  this  study,  these  threat  designations  will  be  classified  under 
"threat  category,"  which  is,  as  mentioned  earlier,  a  subdivision  of  identity  declarations. 

2.1.1  Current  Information  Sources 

Organic  sources  available  on  a  Canadian  Patrol  Frigate  (CPF)  type  ship  for  above 
water  warfare  include  surveillance  and  tracking  radars,  Electronic  Warfare  systems, 
Identification  Friend  or  Foe  (IFF)  systems,  operator  intervention  and  link  data  exchanged 
by  radio  links  among  a  group  of  platforms  under  the  same  command. 
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Identity  Information 

1 

_ 1  1  1 _ 

sensor  signals  attribute  information 

identity  declarations  | 

ex: 

—  frequency  exz 

—  amplitude 

—  pulse  width 

—  size  ex: 

— shape 

—  degree  of  symmetry 

—  emitter 

—  general  classification 

—  type 

—  class 

—  unique  identity 

—  threat  category 

FIGURE  1  -  Identity  Information 

In  general,  radars  provide  positional  information  in  terms  of  range,  azimuth  and 
velocity  components.  Electronic  Warfare  (EW)  systems  include  the  Electronic  Support 
Measure  (ESM)  and  the  Communication  Intercept  System  (CIS).  The  ESM  system 
intercepts  electromagnetic  radiation  from  active  emitters  in  the  environment  and  attempts 
to  measure  or  estimate  such  parameters  as  angle  of  arrival,  frequency,  pulse  repetition 
frequency,  pulse  width  and  scan  period.  These  measurements  are  then  compared  to 
known  characteristics  of  radar  transmitters.  The  ESM  system  thus  provides  positional 
information  (azimuth)  as  well  as  attribute  information  in  the  form  of  emitter  type.  It  may 
also  infer  the  platform  identity  (the  object  which  contains  the  emitter)  by  matching  the 
emitter  type  to  a  platform  data  base.  The  CIS  system  is  capable  of  responding  to 
programmable  tasking  to  search  the  communication  radio  frequencies  and  provide 
bearings  from  airborne  and  surface  originated  radio  frequency  signals.  The  IFF  system 
provides  information  (both  in  terms  of  position  and  identity)  about  a  target  when  a 
cooperative  target  has  responded  to  the  interrogation.  In  the  absence  of  an  answer,  only 
the  location  that  delimits  the  sector  in  which  the  interrogation  was  performed  is  available 
but  identity  declarations  may  be  inferred  (Refs.  7,  9).  Human  intervention  occurs  when 
the  operator  of  the  CIS  system  listens  to  the  signals  and  tries  to  estimate  any  attribute 
information  and/or  identity  declarations  (Ref.  7). 

Intelligence  reports  and  information  from  communication  links  are  examples  of 
non-organic  sources  from  which  positional  and  identity  information  can  be  obtained. 
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FIGURE  2  -  Examples  of  Information  Sources  and  Co  rresponding  Outputs 


Figure  2  gives  examples  of  information  sources  with  their  corresponding  outputs. 
The  latter  are  defined  in  terms  of  positional  information  and  identity  information,  which 
is  itself  subdivided  into  three  categories:  sensor  signals,  attribute  information  and  identity 
declarations. 

2.1.2  Advanced  Information  Sources 

To  cope  with  the  increasing  level  of  threat,  sophisticated  sensor  processing  is 
being  developed  to  provide  the  commander  with  more  accurate  and  timely  information 
concerning  the  position  and  identity  of  detected  objects.  Examples  of  such  technology 
advancement  follow. 

Some  of  the  most  promising  approaches  to  object  identification  by  active  radars 
include:  (1)  inverse-synthetic  aperture  radar,  which  gives  a  two-dimensional  image  of  an 
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object;  (2)  radar-signal  modulation,  in  which  Doppler  modulation  of  the  signal  provides 
target-specific  information;  (3)  resonance  response  to  short  radar  pulses;  and  (4)  high- 
range-resolution  radars  (Ref.  10).  It  is  expected  that  future  radar  systems  will  be  capable 
of  providing  attribute  information.  Also,  the  position  and  speed  calculated  by  radars 
could  be  used  to  infer  attribute  information. 

Many  countries  are  involved  in  the  development  of  advanced  naval  EW  systems 
(Refs.  11-12).  The  aim  of  these  systems  is  to  integrate  all  existing  onboard  electronic 
warfare  systems  -  advanced  electronic  support  measures  (ESM),  jammers,  passive  and 
active  decoys  -  to  provide  a  fully  coordinated  soft  kill  management.  It  is  anticipated  that 
the  advanced  ESM  system  will  be  able  to  identify  the  emitter  type  as  well  as  the  platform 
with  a  good  confidence  level. 

Infrared  imaging  sensors  afford  another  option  for  obtaining  identity  information. 
These  devices  sense  the  electromagnetic  spectrum  in  the  3  to  12  pm  waveband.  They  can 
automatically  acquire  small  air  objects;  however,  the  classification  and  identification  is 
mostly  done  manually.  Studies  are  ongoing  to  automate  these  two  processes  (Ref.  13). 

These  advanced  information  sources  can  and  will  eventually  produce  attribute 
information  and  identity  declarations  in  an  autonomous  fashion;  in  that  sense,  they  are 
self-  contained.  Because  of  their  sophisticated  technology,  they  require  comprehensive 
libraries  of  own  and  enemy  signatures  as  well  as  powerful  processing  capabilities. 

Figure  3  gives  a  more  complete  summary  of  information  sources  with  their 
corresponding  outputs.  The  bold  lines  indicate  the  differences  between  Figs.  2  and  3.  A 
considerable  amount  of  information  pertaining  to  each  category  of  identity  information 
will  be  available.  Figure  3  is  not  exhaustive  in  terms  of  information  sources  and 
position/identity  information.  Its  aim  is  mainly  to  demonstrate  the  potentially  high 
quantity  of  identity  information  that  will  eventually  become  available  to  estimate  the 
identity  of  objects  surrounding  a  ship. 
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2.2  Fusion  of  Identity  Information 

In  a  situation  of  intense  activity,  it  would  seem  impossible  to  handle  manually  the 
flow  of  information  pertaining  to  the  position  and  identity  of  objects.  Automation  offers 
a  viable  solution.  Hall  (Ref.  6)  states  that  fusion  of  identity  information  from  multiple 
sources  yields  both  qualitative  and  quantitative  benefits  because  it  takes  advantage  of  the 
relative  strengths  of  each  source,  resulting  in  an  improved  estimate  of  the  object's 
identity.  Quantitative  benefits  include  increased  confidence,  e.g.,  higher  probability  of 
correct  identification. 

In  order  to  automate  the  fusion  of  identity  information,  techniques  must  be 
devised  to  combine  identity  information  at  various  levels:  sensor  signals,  attribute 
information  and  identity  declarations.  This  can  be  accomplished  by  focusing  on  the 
sensor  processing  units:  energy,  signal  and  target  processing. 

The  energy  processing  unit  is  responsible  for  transforming  the  sensed  energy  into 
a  signal,  a  form  more  suitable  for  target  detection.  Theoretically,  the  output  of  this  stage 
can  be  sent  to  a  fusion  processor.  This  procedure  is  Imown  as  the  signal  level 
architecture,  whereby  signals  from  sensors  are  combined.  An  example  of  application  is 
the  fusion  of  pixels  from  imaging  sensors  to  produce  a  single  i  mage  (Ref.  14). 
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FIGURE  3  -  Examples  of  Advanced  Information  Sources  and  Corresponding  Outputs 

The  signal  processing  unit  is  responsible  for  the  detection  of  objects  and 
assessment  of  the  contact's  position  and  attribute  information.  Again  the  output  of  this 
stage  can  be  sent  to  a  fusion  processor.  This  type  of  fusion  is  known  as  central  level 
architecture,  where  attribute  information  are  combined.  The  procedure  is  as  follows:  a 
series  of  target  attributes  are  extracted  from  sensor  measurements,  then  an  inference  is 
drawn  between  attributes  and  target  types  known  to  possess  certain  attributes.  The 
feasibility  of  such  an  endeavor  was  shown  by  Donker  (Ref.  15),  as  well  as  by  Begin  et  al. 
(Ref.  16),  in  the  specific  case  of  a  Canadian  Patrol  Frigate.  Central  level  architecture  was 
also  demonstrated  for  an  updated  frigate  in  an  AW  (Above  Water  Warfare) 
environment  by  Paramax  (Ref.  7)  and  Simard  et  al.  (Ref.  17). 

Finally,  in  the  target  processing  unit,  the  sensor  processes  many  contacts  related 
to  the  same  object  and  estimates  information  to  produce  identity  declarations.  This 
information  is  then  sent  to  the  fusion  processor  to  be  combined  with  other  identity 
declarations.  This  procedure  is  often  called  sensor  level  architecture.  The  following 
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architecture  terminology  is  also  used  in  the  literature:  data  level,  feature  level  and 
decision  level  fusion  architectures  (Ref.  6). 

Figure  4,  adapted  from  Paramax  (Ref.  7),  displays  the  three  fusion  architectures 
providing  the  structural  basis  for  fusing  identity  information  in  the  case  of  2  sensors.  It 
should  be  noted  that  the  output  from  each  level  of  processing  represents  the  three 
categories  of  identity  information  depicted  in  Fig.  1. 

An  advantage  of  the  signal  level  architecture  is  the  minimum  information  loss 
incurred  since  the  sensor  data  are  fused  directly  without  approximation  via  attributes  or 
identity  declarations  (Ref.  6).  However,  only  data  from  identical  sensors  can  be  fused. 
Also,  its  computational  requirements  are  very  high  due  to  the  complex  techniques 
required  to  fuse  signals.  The  central  level  approach  results  in  an  information  loss  from  the 
sensors  since  sensor  data  are  represented  by  attributes.  Nevertheless,  Refs.  (7,  16-17) 
have  demonstrated  the  potential  of  the  central  level  approach.  The  sensor  level  fusion 
architecture  provides  a  significant  loss  compared  to  signal  fusion,  since  data  are 
represented  via  identity  declarations.  The  information  processing  for  each  sensor  may 
thus  result  in  a  locally  rather  than  a  globally  optimized  solution,  because  the  fusion 
process  only  combines  local  decisions  in  the  hope  of  obtaining  the  correct  identity. 


FIGURE  4  -  Identity  Information  Fusion  Architectures 
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Nevertheless,  the  sensor  level  architecture  approach  would  seem  appropriate  if  the 
available  identity  information  is  only  provided  by  self-contained  sensors  producing 
identity  declarations  in  an  autonomous  fashion,  or  by  non-organic  sources  offering 
identity  declarations.  If  one  tentatively  assumes  that  this  last  scenario  is  viable,  studies 
are  necessary  to  explore  the  possibility  of  combining  diverse  identity  declarations  to 
obtain  an  estimate  of  the  identity  of  objects  surrounding  one  or  many  frigate-type  ships. 
Nahim  &  Pokoski  (Ref.  1 8),  Bogler  (Ref.  1 9),  Buede  et  al.  (Ref.  20)  and  Hong  &  Lynch 
(Ref.  21)  have  shown  the  quantitative  benefits  of  fusing  identity  information,  more 
specifically  identity  declarations,  in  examples  with  limited  scope  in  which  less  than  10 
different  identity  declarations  were  combined.  The  gains  translated  into  an  increased 
level  of  performance  in  identifying  objects  using  the  sensor  level  architecture.  In  this 
document,  we  propose  an  approach  which  offers  the  capability  of  combining  identity 
declarations  pertaining  to  an  AWW  environment.  The  focus  of  our  work  will  thus  be 
on  the  fusion  of  identity  declarations. 

2.3  Fusion  of  Identity  Declarations 

To  look  at  the  problem  of  combining  identity  declarations  in  a  rigorous  manner, 
the  following  issues  need  to  be  addressed:  what  identity  declarations  should  be  combined, 
how  should  they  be  combined  and  which  identity  declaration  best  supports  the  combined 
evidence? 

2.3.1  What  Identity  Declarations  to  Combine? 

As  depicted  in  Section  2.1,  the  quantity  of  identity  declarations  available  may 
become  quite  imposing  and  diverse.  A  potential  way  of  organizing  identity  declarations 
might  be  to  use  the  NATO  standards  for  representing  maritime  tactical  information 
(STANAG  4420,  Ref.  22).  These  standards  are  in  the  form  of  charts  that  define  the  full 
range  of  tactical  information  required  by  the  operational  user  at  the  command  level.  The 
charts  were  created  to  establish  the  basis  for  developing  a  standardized  representation  of 
spatially  displayed  tactical  data  using  symbology  and  colour  for  NATO  maritime  units. 
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Figures  5a  and  5b  offer  an  adapted  version  of  a  tactical  information  hierarchy  for 
surface  and  air  objects  respectively;  certain  levels  and  entries  have  been  omitted  for 
simplicity.  The  names  in  the  various  boxes,  which  are  indicative  of  the  taxonomy  used  in 
STANAG  4420,  represent  identity  declaration  entities.  These  entities  illustrate  the  sort  of 
identity  declarations  provided  by  various  sources  which  must  be  combined  in  order  to 
obtain  an  estimate  of  an  object's  identity.  The  underlined  elements  symbolize  identity 
declaration  domains;  they  do  not  belong  to  the  STANAG  4420  charts  but  were  added  to 
establish  a  relationship  between  the  domains  and  their  specific  entities. 


FIGURE  5a  -  Hierarchy  of  Tactical  Information  -  Surface  Classification 
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FIGURE  5b  -  Hierarchy  of  Tactical  Information  -  Air  Classification 


Some  entities  can  be  further  divided  into  class  and  unit.  These  categories  typify 
the  identity  of  objects  at  two  levels  of  specificity.  The  "unit"  declaration  domain 
characterizes  uniquely  a  detected  object. 

Also  included  in  the  hierarchy  of  tactical  information  is  the  threat  designation;  its 
subdivisions  are  given  in  Fig.  6. 


FIGURE  6  -  Hierarchy  of  Tactical  Information  -  Threat  Category 
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The  hierarchy  of  tactical  information  delineated  by  Figs.  5a,  5b  and  6  should 
encompass  many  of  the  identity  declarations  provided  by  the  organic  and  non-organic 
sources  of  Fig.  3.  Furthermore,  because  this  hierarchy  is  a  NATO  standard,  it  provides  a 
better  base  for  achieving  interoperability  in  information  exchange  between  nations  than 
uncontrolled  alternatives. 

2.3.2  How  to  Combine  Identity  Declarations? 

Given  that  a  hierarchical  structure  has  been  established  between  various  possible 
identity  declarations,  it  would  be  of  interest  to  combine  identity  declarations  pertaining  to 
the  same  object  but  provided  by  various  sources  in  order  to  obtain  a  better  estimate  of  the 
object's  identity.  Before  selecting  appropriate  combination  methods,  certain  issues  need 
to  be  discussed  such  as  sensor  level  fusion  architecture  and  information  uncertainty. 

2.3.2.1  Sensor  Level  Fusion  Architecture 


According  to  Section  2.2,  an  appropriate  approach  to  fusing  identity  declarations 
is  to  apply  the  sensor  level  architecture.  This  architecture  can  be  adapted  to  include 
identity  declarations  inferred  by  operators  as  well  as  the  ones  deduced  by  non-organic 
systems. 


Source  X 


Source  Y 


Source  Z 


FIGURE  7  -  Identity  Declaration  Fusion  Process  -  Sensor  Level  Architecture 
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Figure  7  provides  a  simple  view  of  the  sensor  level  fusion  architecture,  adapted 
from  Refs.  6-7.  In  this  approach,  each  source  infers  identity  declarations  independently 
and  these  inferences  are  then  combined  to  obtain  an  estimated  identity  of  the  observed 
object.  Figure  7  shows  three  dissimilar  information  sources  capable  of  providing  identity 
declarations.  In  source  X,  an  organic  sensor,  the  raw  input  is  transformed  by  the  energy 
processing  unit,  followed  by  the  signal  processing  unit  to  obtain  attribute  information, 
and  finally  by  the  target  processing  unit  that  estimates  identity  declarations,  as  delineated 
in  Fig.  4.  Non-organic  sources  such  as  source  Y  may  directly  provide  identity 
declarations  compatible  with  the  hierarchy  of  tactical  information.  Source  Z  may 
produce  information  concerning  the  identity  of  objects,  though  operator  intervention  may 
be  necessary  to  infer  declarations. 

An  important  feature  of  this  fusion  architecture  is  the  association  process.  As 
mentioned  earlier,  identity  of  an  object  is  meaningless  unless  it  can  be  associated  with  a 
position.  Therefore,  identity  declarations  provided  by  various  independent  sources  must 
be  partitioned  into  groups  representing  observations  belonging  to  the  same  observed 
object.  Algorithms  using  positional  and  identity  information  must  be  applied  to  associate 
declarations  pertaining  to  the  same  object  but  originating  from  different  sources. 

As  for  the  main  process,  called  the  Fusion  Function,  it  must  combine  the  various 
identity  declarations  into  a  single  estimated  identity  declaration. 

23.2.2  Identity  Declarations:  Probabilistic  Information 

The  target  processing  unit  of  the  sensor  level  architecture  is  able  to  estimate 
identity  declarations  by  comparing  the  sensor  attribute  information  with  an  a  priori 
sensor-specific  database.  This  intelligence  database  may  include  both  identity  and 
kinematic  target  feature  parameters.  The  matching  process  of  the  target  processing  unit  is 
uncertain  due  to  random  type  measurement  errors  and  inference  errors.  Measurement 
errors  may  be  smoothed  out  by  a  signal  processing  unit  as  updating  is  performed. 
Updating  is  a  process  whereby  consecutive  data  from  the  same  sensor,  sufficiently 
separated  in  time  or  frequency  to  be  independent,  are  combined  to  produce  more  reliable 
data.  Inference  errors  occur  when  the  target  processing  unit  infers  identity  declarations  in 
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a  form  specific  to  the  sensor's  domain;  an  exact  match  between  attribute  information  and 
a  specific  element  within  the  sensor's  database  is  quite  unlikely.  The  inference  can  often 
be  of  low  confidence  due  to  the  incompleteness  of  the  data  used  in  the  process.  As  an 
example,  Smith  &  Goggans  (Ref.  10)  state  that  there  are  at  least  two  factors  that  can 
render  information  incomplete  in  radars.  First,  the  signal-to-noise  ratio  (SNR)  is 
sometimes  insufficient  at  a  given  processing  gain  to  obtain  the  desired  measurement. 
Second,  even  in  the  absence  of  noise,  the  coded  signal  does  not  provide  sufficient 
information  to  allow  a  deterministic  solution  to  the  problem.  Visual  observations  provide 
another  typical  example  of  uncertain  information.  There  is  some  level  of  uncertainty 
associated  with  this  kind  of  identification  due  to  the  fact  that  the  object  may  be  partially 
obscured  by  fog,  cloud  or  darkness,  or  may  simply  be  too  far  away  to  make  a  conclusive 
identification  (Ref.  8). 

For  the  purpose  of  this  study,  we  assume  that  all  sources  capable  of  providing 
identity  declarations  will  do  so  by  attaching  to  each  declaration  a  quantitative  measure  of 
uncertainty,  such  as  ‘declaration:  frigate,  measure  of  uncertainty:  0.6’.  As  discussed 
earlier,  the  measure  of  uncertainty  in  the  case  of  sensors  is  associated  with  the 
measurement  and  inference  errors  of  the  target  processing  unit.  This  measure 
corresponds  to  the  probability  that  the  identity  declaration  amd  detected  object  are  the 
same  or,  equivalently,  to  the  probability  that  declaration  i  from,  source  s  is  true: 

Cs  j  =  P(declaration  i  from  source  s  matches  detected  object) 

=  P(detected  object  is  i,  given  that  source  s  declared  it  to  be  i) 

=  P(declaration  i  from  source  s  is  true) 

In  the  case  of  non-sensor  information  sources,  the  matching  coefficient  Csi  simply 
typifies  a  subjective  confidence  appraisal  of  the  declaration. 

2.3.2.3  Uncertainty  Fusion  Techniques 

Whatever  options  we  choose  for  combining  identity  declarations,  uncertainty 
techniques  are  necessary.  The  problem  of  combining  identity  declarations  is  simply  one 
of  fusing  uncertain  identity  declarations  rather  than  one  of  inferring  identity  declarations 
from  uncertain  information;  inference  was  accomplished  by  the  various  sources.  Waltz  & 
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Llinas  (Ref.  4)  propose  a  taxonomy  of  identity  fusion  algorithms  whereby  some  methods 
infer  and  others  combine  identity  declarations. 

For  the  purpose  of  representing  and  combining  uncertain  information,  many 
approaches  are  available.  They  can  be  divided  into  two  categories:  numerical  and  non- 
numerical  methods.  Among  the  numerical  methods,  we  find  the  Bayesian  paradigm  of 
probability  theory,  the  certainty  factor  approach,  the  Dempster-Shafer  theory  of  evidence, 
the  possibility  theory,  Thomopoulos's  generalized  evidence  processing  theory  (GEP),  etc. 
Examples  of  non-numeric  techniques  are  the  theory  of  endorsements,  reasoned 
assumption  approach,  non-monotonic  logic,  etc.  A  review  of  numeric  and  non-numeric 
methods  for  handling  uncertain  information  is  found  in  Bhatnagar  &  Kanal  (Ref  23). 

Because  we  are  assuming  that  identity  declarations  from  various  information 
sources  are  independent  from  each  other  and  because  an  inference  method  is  not  required 
here,  the  Bayesian  paradigm  of  probability  theory  (Pearl,  Ref.  2)  and  the  Dempster-Shafer 
evidential  theory  (Shafer,  Ref.  24)  are  appealing  approaches  to  combining  identity 
declarations.  The  first  is  one  of  the  oldest  among  all  numeric  approaches  to  uncertainty. 
The  second,  which  was  conceived  as  a  generalization  of  the  first,  is  well  documented  in 
the  literature.  Both  offer  simplified  algorithms  for  combining  hierarchically  structured 
information  (Refs.  2-3).  These  two  methods  will  be  described,  in  turn,  in  Chapters  3  and 
4  below. 


2.3.3  Decision  Criteria 

Once  identity  declarations  have  been  fused,  a  decision  making  technique  is 
required  to  select  the  identity  declaration  best  supported  by  the  combined  evidence 
(Barnett,  Ref.  25).  Chapter  5  reviews  various  approaches  that  can  be  used  in  the  face  of 
knowledge  combined  by  the  Dempster-Shafer  theory.  Decision  making  will  then  be 
studied  pertaining  to  information  structured  in  a  hierarchical  manner. 

Chapter  6  describes  the  identity  declaration  fusion  function  based  on  the  findings 
of  Chapters  2  to  5  and  provides  an  example  of  application  in  a  naval  context. 
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3.0  THE  BAYESIAN  PARADIGM  OF  PROBABILITY  THEORY 


The  aim  of  Bayesian  probability  theory  is  to  provide  a  coherent  account  of  how 
belief  should  change  in  light  of  partial  or  uncertain  information.  It  is  thus  an  ideal  vehicle 
for  representing  and  combining  uncertain  information.  Judea  Pearl  (Ref.  2,  p.  29)  gives  a 
good  definition  of  the  Bayesian  method: 

The  Bayesian  method  provides  a  formalism  for  reasoning  about  partial 
beliefs  under  conditions  of  uncertainty.  In  this  formalism, 
propositions  are  given  numerical  parameters  signifying  the  degree  of 
belief  accorded  them  under  some  body  of  knowledge,  and  the 
parameters  are  combined  and  manipulated  according  to  the  rules  of 
probability. 

Before  describing  the  representation  and  combination  rule  of  the  Bayesian 
method,  a  review  of  the  axiomatic  specifications  of  probability  theory  will  be  given, 
followed  by  two  possible  interpretations  of  probability. 

3.1  Axioms  of  Probability 

The  modem  axiomatic  theory  of  probability  is  due  to  Kolmogorov  (Refs.  26-27). 

Let  us  consider  a  probability  space  (Q,n,P)  where: 

-  Q  is  a  set,  called  sample  space,  listing  all  possible  distinct  outcomes  that 
may  result  when  a  particular  experiment  is  performed. 

-  n  is  a  a-  field  of  subsets  of  Cl  whose  elements  A  II  are  called  events. 
To  be  a  a- field,  n  must  satisfy  the  following  conditions: 

a.  is  a  member  of  n, 

b.  n  is  closed  under  complementation,  i.e., 

A  II  =>  A  n,  where  A  =  Q  -  A 

c.  LI  is  closed  under  countable  unions,  i.e. 

Ap  a2,  ...» An, ...  n=>  Ui=,Aj  n. 
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-  P  is  a  probability  function  or  probability  measure  if  it  assigns  a  number 
P(A)  to  each  event  A  II  in  such  a  way  that: 

a.  P(A)  >  0  for  all  A  IX 

b.  P(Q)  =  1 . 

c.  If  the  A  j  are  pairwise  disjoint  members  of  II,  then 

p(U‘  a  A; )  =  P(Aj),  known  as  countable  additivity. 

The  above  properties  of  a  probability  measure  P  are  called  Kolmogorov's  axioms 
of  probability.  The  following  basic  consequences  are  direct  results  of  these  axioms: 

VA  II,0<P(A)<1. 

VA  n,P(A)  +  P(A)=  1.  (3.1) 

These  axioms  clearly  delineate  the  constraints  of  the  probability  measure  P.  They 
are  nevertheless  capable  of  supporting  various  interpretations  of  probability.  The 
following  section  gives  two  particular  definitions  of  probability  and  shows  how  each  is 
supported  by  Kolmogorov's  axioms. 

3.2  Definitions  of  Probability 

The  first  definition  of  probability  that  comes  to  mind  is  the  one  based  on  relative 
frequency.  According  to  that  definition,  the  probability  P(A)  of  an  event  A  stands  for  the 
proportion  of  times  that  this  event  occurred  in  a  large  (ideally  infinite)  number  of 
independent  trials  carried  out  under  identical  experimental  conditions.  A  common 
example  is  the  roll  of  a  balanced  dice  where  the  sample  space  is  Q  =  (1,2, 3, 4, 5, 6}  and  the 
field  of  subsets  II  is  generated  by  the  elementary  events  {1},  {2},  {3},  {4},  {5},  {6}. 
The  probability  of  obtaining  number  1  (event  {1})  is  considered  to  be  1/6  because  the 
relative  frequency  of  number  1  should  approach  1/6  when  the  dice  is  rolled  a  large 
number  of  times  under  similar  conditions.  In  a  similar  fashion,  the  probability  that 
numbers  1  or  2  appear  should  be  approximately  2/6.  It  is  important  to  note  that,  in 
practice,  the  actual  observed  frequency  of,  say,  the  event  {1,2}  will  vary  according  to  the 
number  of  times  the  dice  is  rolled  and  the  similarity  of  the  experimental  conditions.  The 
frequency  interpretation  of  probability  applies  only  to  problems  in  which  there  can  be  a 
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large  number  of  similar  repetitions  of  a  certain  process  (Ref.  28).  Because  the  frequency 
interpretation  of  probability  satisfies  the  axioms  of  probability  as  delineated  by  the  roll  of 
a  dice  example,  probabilities  can  be  viewed,  where  appropriate,  as  statistical  measures  of 
proportions.  This  definition  is  known  as  the  empirical  interpretation  of  probabilities  (Ref. 
29);  it  is  also  called  objective  probability. 

The  second  approach  to  probability  views  it  as  a  degree  or  measure  of  belief  of  an 
individual  about  the  outcome  of  an  experiment.  As  mentioned  by  Bacchus  (Ref.  29), 
probability  then  becomes  an  epistemic  concept,  related  to  an  agent's  beliefs,  instead  of  an 
empirical  property  related  to  relative  frequency.  In  this  case.,  probabilities  are  subjective 
because  different  individuals  may  assign  different  probabilities  to  the  same  event, 
possibly  because  their  information  bases  are  different.  However,  a  single  individual 
should  assign  the  same  value  to  the  same  event  (Ref.  30).  For  example  an  individual 
might  want  to  quantify  his/her  beliefs  concerning  the  probability  of  obtaining  number  1 
from  the  roll  of  a  dice;  the  sample  space  is  again  Q  =  {1,2.3, 4, 5,6}.  According  to  the 
individual's  beliefs,  experience  and  information  about  the  experimental  conditions,  he/she 
could  assign  1/3  to  the  likelihood  of  obtaining  number  1.  Conversely,  the  likelihood  of 
not  rolling  a  1  would  be  2/3.  Inasmuch  as  a  person's  judgments  of  the  relative  likelihoods 
of  various  combinations  of  outcomes  satisfy  the  required  consistency  for  his/her 
subjective  probabilities  of  the  different  possible  events  to  be  uniquely  determined,  the 
subjective  interpretation  of  probability  satisfies  the  axioms  of  probability  (Ref.  28). 

In  our  study  of  identity  declaration  fusion,  both  objective  and  subjective  evidence 
is  available.  Objective  evidence  arises  from  sensor  measurements  of  features  which  are 
likely  to  be  corrupted  by  noise  and  other  distortions  (Subsection  2.3.2.2).  Therefore  the 
description  of  the  expected  feature  measurements  for  known  objects  can  be  incomplete  or 
imprecise.  Subjective  evidence  or  judgment  regards  the  occurrence  or  existence  of  a 
particular  object  in  the  coverage  area,  based  upon  a  priori  or  expert  opinion.  Information 
sources  such  as  operator  intervention,  intelligence  reports  and  communication  links  can 
provide  subjective  evidence  (Section  2.1.1).  For  this  reason,  both  types  of  evidence  will 
be  used  in  our  study,  as  they  apply  to  the  Bayesian  approach. 
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3.3  The  Bayesian  Paradigm 

The  Bayesian  paradigm  is  based  on  three  axioms  of  probability  used  to  describe 
the  degree  of  belief  of  a  proposition  (hypothesis),  given  a  body  of  knowledge  (evidence). 
It  requires  a  set  of  prior  probabilities  to  describe  the  environment.  The  evidence 
pertaining  to  an  event  is  interpreted  in  light  of  the  prior  probabilities.  The  results  of  this 
analysis  is  a  set  of  posterior  probabilities  (Ref.  31).  Pearl's  definition  of  the  Bayesian 
method,  given  at  the  beginning  of  this  chapter,  combines  the  notions  of  evidence  and 
belief  in  the  following  manner:  a  hypothesis  is  attributed  a  degree  of  belief,  say  p,  given 
some  evidence.  Syntactically,  this  is  written  as: 


P(hypothesis  j  some  evidence)  =  p 


Equivalently, 


P(H  |  E)  =  p 


which  specifies  the  belief  in  hypothesis  H  under  the  assumption  that  evidence  E  is 
known.  P(H  |  E)  is  called  Bayes  conditionalization  and  is  calculated  by  the  following 
formula: 


In  a  similar  fashion, 


P(H  |  E)  = 


P(E  |  H)  = 


P(HoE) 

P(E) 


P(EnH) 
P(H)  ’ 


and  by  substitution,  we  obtain: 


P(H|E)  = 


P(E  |  H)  P(H) 
P(E) 


(3.2) 


This  formula  represents  the  combination  rule  of  the  Bayesian  method.  Equation 
(3.2)  states  that  the  belief  in  hypothesis  H  upon  obtaining  evidence  E  can  be  calculated  as 
the  likelihood,  P(E  |  H),  that  E  will  materialize  given  that  H  is  true,  multiplied  by  the 
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previous  belief  P(H)  in  H.  P(H  |  E)  is  often  called  the  posterior  probability  and  P(H)  the 
prior  probability.  The  denominator  P(E)  is  a  constant  which  simply  ensures  that: 

0  <  P(H  |  E)  <  1. 

3.4  Recursive  Formulation  of  Bayes’  Formula 
Let  us  introduce: 

Ej  =  evidence:  declaration  of  object  of  classification  type  i;  i  =  1, I 
Hj  =  hypothesis:  presence  of  object  of  classification  type  j;  j  =  1, ...,  J 

The  events  E„  ...,  Ej  are  mutually  exclusive,  meaning  that  there  is  no  overlap  between 
them.  Let  us  assume  that  they  are  also  exhaustive,  which  implies  that  they  completely 
describe  the  possible  events.  Likewise,  let  us  suppose  that  events  H„  ...,  Hj  are  mutually 
exclusive  and  exhaustive. 


According  to  Bayes  rule  (3.  2),  we  have: 


P(Hj  |  E, ) 


P(Ej  1  HjjPqij) 
P(E;) 


(3.3) 


where  P(Ef)  is  given  by: 

p(Ei)=E!=1p(Ei  ihs)p(hj. 


Equation  3.3  gives  the  probability  that  the  object  present  is  of  classification  type  j 
given  that  an  object  of  classification  type  i  was  declared.  The  likelihood  matrix  P(Ef  |  Hj) 
can  be  obtained  by  allowing  the  source  to  observe  the  objects  of  interest  and  make  its 
declaration  often  enough  to  obtain  a  representative  sample  (Ref.  1).  The  number  of 
elements  in  the  column  must  be  equal  to  the  number  of  possible  declarations.  Therefore, 
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K1P(Ei  I  Hj)  =1,  Vje  {1 . J} . 

Schematically,  the  format  of  the  likelihood  matrix  is  given  in  Table  I  in  the  special 
case  where  I  =  J  =  2. 


TABLE  I 

Format  of  the  Likelihood  Matrix 


Object  present 

H. 

h2 

Declaration 

E, 

P(E.|H1) 

P(E,|H2) 

e2 

P(E2|H,) 

P(E2|H2) 

2=1. 

For  example,  let  the  likelihood  matrix  of  a  given  source  be: 


PCEJHj) 


'0.8  0.4^ 

,0.2  0 .6) 


If  an  object  of  classification  type  2,  H2,  is  presented  to  the  source,  then  E2  will  be  the 
source  response  60  percent  of  the  time,  while  Ej  will  be  the  source  response  40  percent  of 
the  time.  A  "perfect"  source  would  be  described  by  the  identity  matrix  (Ref.  18). 

Let  us  now  assume  that  the  source  can  produce,  in  a  time  sequential  manner, 
additional  information  concerning  the  object  in  the  coverage  area.  Let  Ej  and  Ej;  be 
declarations  from  the  source  at  time  1  and  2.  Therefore  the  conditional  probability  that 
Hj  occurs  is  assessed  by  taking  into  account  Ej  and  Ejj : 
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P(H j  1  Ek  n  E j )  =  P(H j  |  Ek ,  E  j )  = 


P(E^  n  Ej  |  Hj )  P(Hj ) 
P(Ek  n  E| ) 


(3.4) 


If  we  assume  that  Ej  and  Ek  are  conditionally  independent  given  Hj5  then  all  received 
information  can  be  combined  by  simple  multiplication  (Ref.  32).  Equation  3.4  can  be 
rewritten  as  follows: 


P(Hj  I  Ek,Ej) 


u  _  P(Ek  1  Hj)P(E?  |  Hj)P(Hj) 


P(E2k  |  Ej)P(Ej) 


(3.5) 


where 


P(E;|E!)=  XPfEjnH.IE') 

s-1 

=  £pOe:  |E‘nH,)P(HJE!) 

5=1 

=  £p(EJ[Hs)P(Hs|E;). 


S=1 


Since  P(E£  |  Hs)  =  P(E^  |  Ej  n  Hs)  because  of  conditional  independence,  substituting 
,  P(E|  |  H^POH:) 

P(H:|E  )  for - - —  in  (3.5),  we  obtain: 

v  Jl  P(E|) 


PCHJE2,^) 


P(E2k  |  Hj)  P(Hj  |  Ej)  P(Ek  1  Hj)  P(Hj  j  Ej) 


PCEJIEJ) 


2P(E2,  I  H,)P(H,  I  El) 

S-1 


where  P(Hj  |  Ej )  is  the  posterior  probability  calculated  after  receiving  the  first  evidence. 


More  generally,  it  follows  by  induction  that: 


P(Ej  |  Hj)P(Hj  |  Ej'1  ,  ...,  Ej  ,  E°  ) 
E1  e°  )  =  v  *o>  v  j  1  *<m)  ’  ’  ‘CD  ’  '(o) ' 

’  id)’  i<°)  VJ  n/pt 


y  P(Ej  I  H  )  P(H  I  Ej'1  ,  ...,  Ej  ,  E°  ) 

1  v  *(t)  1  5/  v  5  1  l(t*D  *(i)  *(0)  ' 


P(H:  |  Ej  ,  Ej'1 
v  j  1  *(t)  "o-i) 


(3.6) 
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where  P^jE?)  =  P(Hj). 


To  better  appreciate  the  Bayesian  approach  to  uncertainty  in  terms  of  representation  and 
combination  of  information,  two  simple  applications  are  given  in  Appendix  A  (Section 
A.1). 

3.5  The  Bayesian  Approach  to  Hierarchical  Evidence 

As  mentioned  in  Subsection  2.3.1,  identity  declarations  originating  from  various 
sources  can  be  represented  in  a  hierarchical  fashion.  The  examples  of  Appendix  A 
(section  A.l)  were  rather  simple  in  the  sense  that  the  evidence  was  not  hierarchically 
structured.  To  accommodate  hierarchical  evidence  using  the  Bayesian  formalism,  J.  Pearl 
(Refs.  2,  33)  devised  a  method  that  calculates  the  impact  of  an  evidence  on  the  belief  of 
every  hypothesis  in  the  hierarchy.  The  definition  of  a  strict  hierarchical  tree  is  given 
below,  followed  by  the  description  of  the  technique  suggested  by  J.  Pearl.  A  numerical 
example  is  given  in  Appendix  A  (Section  A.2). 

3.5.1  Strict  Hierarchical  Tree 

Let  us  first  define  Q  =  {H,,  H2,...  Hn}  the  collection  of  possible  outcomes  or 
hypotheses  known  to  be  mutually  exclusive  and  exhaustive.  A  collection,  VP,  of  subsets 
of  Q  is  chosen  to  represent  specific  events  or  sets  that  are  of  interest.  A  strict 
hierarchical  tree  can  be  created  with  the  events  of  'F  if  each  set  has  a  unique  parent  set 
that  contains  it.  For  example  let: 

a  =  fixed  wing/military,  d  =  helicopter/civil, 

b  -  helicopter/military,  e  =  missile, 

c  =  fixed  wing/civil, 

and  Q  =  {a,  b,  c,  d,  e), 

'P  =  {{a},  {b},  {c},  {d},  {e},  (a,  b},  {c,  d},  {a,  b,  c,  d,  e}}. 
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The  strict  hierarchy  given  by  the  elements  of 'F  is  represented  by  Figure  8,  which  is  itself 
a  subset  of  Figure  5b.  Here,  some  events  of  'F  have  been  renamed  to  simplify  the 
terminology;  for  example  the  event  {fixed  wing/military,  helicopter/military}  is  called 
"military"  and  the  event  {fixed  wing/military,  helicopter/military,  fixed  wing/civil, 
helicopter/civil,  missile}  is  renamed  "air". 

Each  set  is  viewed  as  a  node  in  the  tree.  The  set  Q  or  "air"  becomes  the  root  of 
the  tree,  the  elementary  events  or  single  hypotheses  of  Q  are  the  leaves,  and  the 
intermediate  nodes  represent  the  unions  of  their  immediate  successors. 


air 

i  1  : 

_ 1  1  1 _ 

military  civil 

weapon 

i  1 

_ 1  1 _ 1  1 _ 

fixed  wing  helicopter 

fixed  wing 

helicopter 

missile 

FIGURE  8  -  Strict  Hierarchical  Tree  of  Hypo  thesis  -  Air 


3.5.2  Technique  Suggested  by  J.  Pearl 

The  approach  first  combines  the  evidence  (E),  which  is  in  the  form  of  identity 
declarations,  with  the  current  information  belonging  to  the  specific  hypothesis  (H)  being 
updated.  Secondly,  it  distributes  this  new  information  to  the  other  hypotheses  of  the  tree 
according  to  a  proportionality  rule.  This  last  step  is  necessary  because  the  evidence  has 
an  impact  on  the  belief  of  every  hypothesis  in  the  hierarchical  structure. 

The  combination  of  information  is  performed  using  a  specialization  of  the 
Bayesian  combination  rule  to  the  case  of  dichotomous  alternatives  (3.3)  (the  indices  have 
been  omitted  to  simplify  the  equations): 


0(H  |  E)  =  Ah  0(H) 


(3.7) 
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where  A„  =  I  _ ^  is  the  likelihood  ratio, 

H  P(E  |  H) 

O(H)  =  is  the  prior  odds,  and 

0(H  |  E)  =  — ^  ^  is  the  posterior  odds. 

P(H  |  E) 

Because  the  evidences  are  assumed  to  be  conditionally  independent,  we  can  obtain  from 
(3.7)  a  recursive  equation  similar  to  (3.6),  namely: 


P(H  |  E‘,  E*'1,  ...,E*,  Eu)  =  «'H  AH  P(H  |  Ew,  ...,E‘,  E°), 


eh-  J 


t-l  T7 1  T7  0  > 


(3.8) 


where  the  normalizing  factor  alu  is  given  by: 


«*H  =  1/[Ah  P(H|EW,  ....  E1,E°)  +  l-P(H|Et-1,...,  E1,  E0)]  (3.9) 


The  likelihood  ratio  AH  can  be  regarded  as  the  degree  to  which  the  evidence  confirms  or 
disconfirms  the  hypothesis;  confirmation  is  expressed  by  AH>1  and  disconfirmation  by 
AH<1.  The  scalar  alH  effectively  normalizes  (3.8)  since  AH  behaves  like  a  weight;  a 
weight  AH  is  given  to  P(H  |  Et_1,  ..^E^E0)  and  a  weight  of  1  is  given  to  all  other 
hypotheses  belonging  to  the  same  hierarchical  level. 

Once  the  evidence  has  been  combined  with  the  current  hypothesis  information 
using  (3.8),  the  impact  at  other  hierarchical  levels  must  be  evaluated.  To  this  end,  Pearl 
(Ref.  2)  proposes  the  following  rules,  which  involve  the  children  of  a  node  H  (i.e.,  the 
nodes  directly  below  it)  and  the  father  of  node  H  (i.e.,  the  node  directly  above  it). 

A  -  Impact  on  the  children  of  H: 

Each  child  of  H  will  be  modified  by  a  factor  of  alu  AH.  The  children’s  children 
and  so  forth  will  be  modified  in  a  similar  fashion. 

B  -  Impact  on  the  father  F  of  H: 

P(F|  Et)  =  a\l  P(F|  E‘-\  E°)  +  P(H|  E‘,  ENl,  ...,E\  E0)- 

a*H  P(H|  E‘-\,  ...,E\  E°) 

=  a»  [P(F|  EM,  ...jE1,  E°)-P(H|  E-1,  ...,E1,  E°)]  + 

P(H|  E‘,  E'-1,  ...,E\  E°) 
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The  father  of  the  father  of  H  and  so  forth  will  be  modified  in  a  similar  fashion  by 
substituting  the  appropriate  values  for  hypotheses  H  and  F. 

C  -  Impact  on  all  other  hypotheses: 

All  other  hypotheses  will  be  modified  by  the  normalizing  factor  a(H . 

The  previous  equations  obey  the  rule  stating  that  each  node  of  the  tree  should  acquire  a 
belief  equal  to  the  sum  of  the  beliefs  belonging  to  its  immediate  successors. 

Since  the  combination  technique  suggested  by  Pearl  is  analogous  to  the 
combination  rule  of  (3.3),  it  would  have  been  possible  to  use  (3.8)  in  the  example  of 
Appendix  A  (Section  A.l)  if  the  hypotheses  had  been  hierarchical  in  nature.  Figure  9 
provides  an  example  of  a  strict  hierarchical  tree  of  hypotheses:  Q  =  {C,  D,  I,  K,  L,  M,  N, 
O,  P}.  As  before,  other  letters  are  used  to  represent  unions  of  these  outcomes,  e.g.  B  = 
(C,  D},  G  =  {L,  M}  and  H  =  {K,  N,  O,  P}.  A  priori  probabilities  are  indicated  for  each 
set  of  interest.  The  numerical  calculations  are  presented  in  Appendix  A  (Section  A.2). 


FIGURE  9  -  Example  of  Combination  and  Propagation  of  Belief  Values  Using  J.  Pearl's 

Technique 

3.6  Comments  Concerning  the  Bayesian  Approach 

As  described  earlier,  the  Bayesian  approach  requires  that  all  information  sources 
have  a  complete  and  accurate  knowledge  of  both  the  a  priori  probability  distribution  and 
the  conditional  probability  matrices.  If  the  sources  have  little  or  no  knowledge  about 
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such  things,  the  Bayesian  approach  forces  them  to  guess  anyway  no  matter  how 
impoverished  the  information  (Ref.  19).  Pearl’s  updating  rules  are  an  example  of  such 
guessing  activities.  In  the  event  where  some  or  all  this  information  is  unavailable,  this 
method  is  at  a  disadvantage  (Ref.  34). 

An  additional  difficulty  associated  with  the  Bayesian  approach,  is  the  fact  that 
uniform  prior  probability  distributions  are  often  used  to  represent  complete  ignorance. 
For  this  reason,  there  is  no  way  of  distinguishing  between  instances  of  ignorance  and 
instances  in  which  known  information  suggests  a  uniform  distribution.  Thus,  if  evidence 
supports  proposition  (A  or  B  or  C)  with  probability  0.6,  it  supports  individual 
propositions  (A),  (B),  (C)  with  probability  0.2.  As  a  result,  there  is  a  twofold  support  of 
the  disjunction  of  any  two  of  these  propositions  over  the  other.  In  other  words,  if 


and 

then,  for  example, 


P(A  or  B  or  C)  =  0.6 
P(A)  =  P(B)  =  P(C)  =  0.2 
P(A  or  B)  =  0.2  +  0.2  =  0.4  =  2  x  P(C). 


If  the  probabilities  were  assigned  on  the  basis  of  ignorance,  however,  then  there  was  no 
evidence  to  indicate  that  the  disjunct  occurrence  (A  or  B)  is  greater  than  the  singleton  of 
(C).  The  only  proposition  supported  by  the  evidence  was  (A  or  B  or  C)  and  there  was  no 
way  of  distinguishing  between  subsets  of  that  event  (Ref.  35). 


Finally,  another  problem  connected  with  probability  coding  of  beliefs  in  general  is 
the  requirement  that  the  probability  of  the  negation  of  a  hypothesis  A,  for  example,  be 
fixed  once  the  probability  of  A  is  known.  Because  of  (3.1),  one  cannot  withhold  belief 
from  a  proposition  without  increasing  belief  in  its  negation  (Ref.  36).  However  a  lack  of 
support  for  a  hypothesis  does  not  necessarily  equate  to  support  in  its  complement.  For 
example,  if 

P(A  or  B  or  C)  =  .6 

then,  in  Bayesian  terms: 

1  -  P(A  or  B  or  C)  =  P(not  (A  or  B  or  C))  =  .4 
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However,  it  is  important  to  recognize  that  if  the  evidence  received  is  incomplete,  then  the 
above  implication  between  (A  or  B  or  C)  and  (not  (A  or  B  or  C))  cannot  be  made;  the 
evidence  should  only  support  the  disjunction,  not  refute  it  (Ref.  35). 

3.7  Conclusion 

Probability  theory  and  the  Bayesian  paradigm  of  probability  are  well  formalized 
methodologies.  If  a  priori  probability  distributions  and  conditional  probability  matrices 
are  available  and  if  probabilities  can  be  distributed  to  single  elements,  then  the  Bayesian 
approach  should  be  used.  However,  as  we  have  seen  above,  the  Bayesian  approach 
suffers  from  major  deficiencies  in  a  hierarchical  context,  when  fully  specified  likelihoods 
are  not  available.  Pearl’s  rules  can  be  applied  but  may  not  always  lead  to  an  appropriate 
estimation  of  the  posterior  probabilities  associated  with  ceitain  nodes  of  a  hierarchy. 
Other  problems  associated  with  the  Bayesian  approach  are  the  coding  of  ignorance  and 
the  strict  requirements  on  the  belief  of  a  hypothesis  and  its  negation. 

Consequently,  the  Bayesian  approach  may  not  be  the  ideal  technique  to  fuse 
uncertain  information  in  the  context  of  identity  estimation  by  multiple  dissimilar  sources. 
We  will  thus  investigate  a  generalization  of  the  Bayesian  approach  which  does  not 
arbitrarily  allocate  probabilities  to  the  children  and  parents  of  a  node  when  this  node  is 
updated  with  new  information.  This  technique  is  based  on  the  Dempster- Shafer  theory 
of  evidence. 
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4.0  THE  DEMSPTER-SHAFER  THEORY  OF  EVIDENCE 

The  Dempster- Shafer  theory  was  developed  by  Canadian  statistician  Arthur 
Dempster  in  the  1960's  (Ref.  37)  and  extended  by  Glenn  Shafer  in  the  1970's  (Ref.  24). 
As  in  the  Bayesian  approach,  this  theory  supports  the  representation  of  uncertain 
information  and  provides  a  technique  for  combining  it.  The  idea  behind  the  Dempster- 
Shafer  theory  is  best  described  by  Shafer  himself  (Ref.  15): 

The  theory  of  belief  functions  provides  two  basic  tools  to  help  us  make 
probability  judgments:  a  metaphor  that  can  help  us  organize 
probability  judgments  based  on  a  single  item  of  evidence,  and  a  formal 
rule  for  combining  probability  judgments  based  on  distinct  and 
independent  items  of  evidence. 

The  Dempster-Shafer  technique  does  not  require  prior  probabilities  nor  does  it 
need  to  know  the  capability  of  each  source.  Evidence  is  not  committed  to  any  specific 
event  or  set  of  events  until  evidence  is  gained.  Also,  it  is  not  required  that  belief  not 
committed  to  a  given  proposition  should  be  committed  to  its  negation,  nor  that  belief 
committed  to  a  given  proposition  should  be  committed  more  specifically  (Ref.  1 5).  The 
technique  actually  focuses  on  the  probability  of  a  collection  of  points  belonging  to  the 
sample  space,  whereas  the  classical  probability  theory  is  interested  in  the  probability  of 
the  individual  points  (Ref.  38). 

The  Dempster-Shafer  technique  has  the  capability  of  expressing  ignorance 
explicitly.  For  example,  if  A  and  B  are  the  only  hypotheses  in  the  sample  space,  then 
P(A)  =  P(B)  =  .5  indicates  that  the  beliefs  in  A  and  B  are  the  same  and  no  ignorance 
about  their  occurrences  exists.  However  if  the  only  available  information  concerns 
hypothesis  A  with  P(A)  =  .5,  it  implies  that  the  belief  .5  is  associated  to  hypothesis  A  and 
the  other  .5  is  attributed  to  the  sample  space  Q  =  (A,  B},  thereby  delineating  the 
ignorance. 

The  Dempster-Shafer  technique  does  not  fully  abide  by  the  axioms  of  probability 
as  stated  in  Section  3.1.  In  particular,  this  technique  is  not  constrained  by  equation  (3.1) 
but  rather  supports  the  following  restriction: 
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VA  n,P(A)  +  P(A)<l.  (4.1) 

Equation  3.1  is  obviously  a  particular  case  of  (4.1),  which  allows  the  Bayesian 
paradigm  to  be  described  as  a  subclass  of  the  theory  of  evidence. 

The  following  sections  describe  in  a  more  rigorous  fashion  the  theory  behind  the 
Dempster-Shafer  evidential  approach  in  terms  of  representation  and  combination  of 
evidence. 

4.1  Representation  of  Evidence 

4.1.1  Terminology 

In  the  Dempster-Shafer  evidential  theory,  the  terminology  is  slightly  different 
from  that  used  in  Probability  theory  (Ref.  24).  The  new  expressions  are  in  italics.  The 
frame  of  discernment  ©  is  defined  as  an  exhaustive  set  of  mutually  exclusive  events  or 
propositions  of  a  particular  experiment.  It  plays  the  role  of  the  sample  space  Cl  (Section 
3.1),  so  that  0  denotes  a  set  of  possible  answers  to  some  question  where  only  one  answer 
is  correct.  We  denote  2®  the  set  of  all  subsets  of  0.  The  elements  of  2®,  or  equivalently 
the  subsets  of  0,  form  the  class  of  general  propositions  and  include  the  empty  set  0, 
which  corresponds  to  a  proposition  that  is  known  to  be  false,  and  the  whole  set  0,  which 
corresponds  to  a  proposition  that  is  known  to  be  true.  We  will  assume  throughout  our 
discussion  that  the  frame  of  discernment  is  finite. 

Let  A  be  a  subset  of  0.  The  evidential  theory  differentiates  between  the  measure 
of  belief  committed  exactly  to  A  and  the  total  belief  committed  to  A.  The  first  is 
characterized  by  the  basic  probability  assignment  and  the  latter  by  the  belief  function. 

A  function  m  is  a  basic  probability  assignment  if  it  assigns  a  number  m(A)  to 
each  subset  A  2®,  in  such  a  way  that: 


a.  m(0)  =  0. 
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b.  =  1. 

As© 

The  quantity  m(A)  is  called  A's  basic  probability  number  and  represents  the  exact  belief 
in  the  proposition  depicted  by  A. 

A  function  Bel  is  a  belief  function  if  it  assigns  a  number  Bel(A)  to  each  subset  A 
2®  in  such  a  way  that: 

a.  Bel(0)  =  0. 

b.  Bel(0)  =  1. 

c.  for  every  positive  integer  n  and  every  collection,  Ap  A2,  A„  of 
subsets  of  0, 

Bel(A1u...uA„)  >  ^Bel(A,)  -  J]Bel(A  nA,)+...t(-l)'*1  Bel(A,r\..nAr) 

Ic{l._n}  iel 

1*0 

The  quantity  Bel(A)  is  called  the  degree  of  belief  about  proposition  A.  A  belief  function 
assigns  to  each  subset  of  ©  a  measure  of  the  total  belief  in  the  proposition  represented  by 
the  subset.  Here  |I|  stands  for  the  cardinality  of  the  set  I.  The  simplest  belief  function  is 
the  one  where  Bel(0)  =  1  with  Bel(A)  =  0  for  all  A  ^  0;  this  function  is  called  the 
vacuous  belief  function.  A  belief  function  can  be  obtained  from  the  basic  probability 
assignment: 

Bel(A)  =  m(B),  for  all  A  £  0.  (4.2) 

B 

A  basic  probability  assignment  can  in  turn  be  defined  as  follows  with  reference  to  the 
belief  function: 

m(A)  =  £(-l)|A-B|Bel(B),  for  all  A  £  0. 

BCA 
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In  that  sense,  a  basic  probability  assignment  and  belief  function  convey  exactly  the  same 
information. 

A  subset  A  of  ©  is  called  a  focal  element  of  a  belief  function  Bel  over  0  if  m(A)  > 
0.  The  union  of  all  the  focal  elements  of  a  belief  function  is  called  its  core. 


A  belief  function  Bel  is  called  a  simple  support  function  S  if  there  exists  a  non¬ 
empty  subset  Ac:©  and  a  number  0  <  s  <  1  such  that: 


S(B)  =  Bel(B)  = 


0  if  B  does  not  contain  A 
■  s  if  B  contains  A  but  B  *  © 
1  if  B  =  0 


Therefore,  a  simple  support  function  precisely  supports  the  subset  A  and  any  subset 
containing  A  to  the  degree  s,  but  it  provides  no  support  for  the  subsets  of  ©  that  do  not 
contain  A.  If  S  is  a  simple  support  function  focused  on  A,  then  S  is  the  belief  function 
with  basic  probability  numbers: 

a.  m(A)  =  S(A)  =  s, 

b.  m(0)  -  1-  S(A)  =  1  -  s, 

c.  m(B)  =  0  for  all  other  Be;©. 

If  a  simple  support  function  does  have  a  focal  element  not  equal  to  0,  then  this  focal 
element  is  called  the  focus  of  the  simple  support  function. 

A  singleton  is  a  subset  of  the  frame  of  discernment  with  only  a  single  member.  A 
simple  support  function  focused  on  a  singleton  is  a  belief  function  whose  only  focal 
elements  are  the  singleton  and  ©.  If  all  the  focal  elements  are  singletons,  then  the  belief 
function  Bel  is  called  a  Bayesian  belief  function. 

A  belief  function  is  called  dichotomous  with  dichotomy  {A,  A}  if  it  has  no  focal 
elements  other  than  A,  A  and  0. 
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Another  function,  the  commonality  function,  can  be  obtained  from  the  basic 
probability  assignment.  A  function  Q  is  a  commonality  Junction  if  it  assigns  a  number 
Q(A)  to  each  subset  A  2®  in  such  a  way  that: 

a.  Q(0)  =  1. 

b.  £(-lf  Q(A)  =  o. 

Ac© 


The  quantity  Q(A)  is  called  A's  commonality  number.  It  is  the  sum  of  basic  probability 
numbers  for  the  set  A  and  all  sets  which  contain  it.  A  commonality  function  can  be 
defined  with  reference  to  the  belief  function: 

Q(A)  =  £(-l)|B|Bel(B),  for  all  A  c  ©  (4.3) 

BcA 

or  with  reference  to  the  basic  probability  assignment: 

Q(A)  =  m(B),  for  all  Bc0.  (4.4) 

AcB 


A  belief  function  can  in  turn  be  obtained  as  follows  from  the  commonality  function: 

Bel(A)  =  (-1)|B|  Q(B),  for  all  Ac®.  (4.5) 

BgA 


Therefore,  the  sets  of  basic  probability  assignments,  belief  functions  and 
commonality  functions  are  in  one-to-one  correspondence  and  each  representation  carries 
the  same  information  as  any  of  the  others  (Ref.  24). 

Yet  another  function  conveys  the  same  information  as  the  belief  function;  it  is 
called  the  plausibility  function.  Let  Bel  be  a  belief  function  over  a  frame  0;  a  function  PI 
is  a  plausibility  function  if  it  assigns  a  number  P1(A)  to  each  subset  A  2®  in  such  a  way 
that: 


P1(A)  =  1  -  Bel(A) 


or 


(4.6) 
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or 


P1(A)  =  m(B),  for  all  A  c  ©  (4.7) 

Ar>B#0 

P1(A)  =  £  (-1)|B|+I  Q(B),  for  all  A  c  0 .  (4.8) 

BcA 


The  quantity  P1(A)  is  known  as  the  degree  of  plausibility  of  A  and  expresses  the  extent  to 
which  one  fails  to  doubt  A.  P1(A)  will  be  zero  when  the  evidence  refutes  A,  and  unity 
when  there  is  no  evidence  against  A.  From  equations  (4.2)  and  (4.7)  we  conclude  that: 

Bel(A)  <  P1(A). 

The  degree  of  belief  and  degree  of  plausibility  summarize  the  impact  of  the 
evidence  on  a  particular  proposition  A  in  the  following  manner:  the  first  shows  how  well 
the  evidence  supports  proposition  A  and  the  second  reports  on  how  well  its  negation  A 
is  supported.  This  information  can  be  expressed  in  the  forni  of  an  interval  called  the 
evidential  interval  whose  length,  P1(A)  -  Bel(A),  can  be  referred  to  as  the  ignorance 
remaining  about  subset  A: 

evidential  interval  on  subset  A  =  [Bel(A)  ,  P1(A)]. 

If  ignorance  about  subset  A  is  zero,  then  the  Dempster-Shafer  process  is  identical  to  the 
Bayesian  approach  such  that  Bel(A)  =  P(A)  =  P1(A);  if  ignorance  about  A  is  equal  to  one, 
then  no  knowledge  at  all  is  available  concerning  subset  A. 

More  generally,  the  relationship  between  the  Bayesian  and  Evidential  theories  can 
be  described  by  the  following  equation: 

Bel(A)  <  P(A)  <  P1(A). 

Thus,  the  degree  of  belief  and  degree  of  plausibility  on  a  hypothesis  can  be  seen  as  the 
lower  bound  and  upper  bound  on  the  probability  of  that  hypothesis.  A  simple  numerical 
example  is  presented  in  Appendix  B  (Section  B.l)  to  clarify  the  wealth  of  terminology 
associated  with  the  Evidential  theory. 
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4.2  Combination  of  Evidence 

4.2.1  Terminology 

As  in  the  Bayesian  theory  (see  equation  3.2),  Evidential  theory  proposes  a 
combination  rule,  called  Dempster's  rule  of  combination,  which  synthesizes  basic 
probability  assignments  and  yields  a  new  basic  probability  assignment  representing  the 
fused  information.  The  combination  rule  is  known  as  the  orthogonal  sum  and  is  denoted 
by© .  Basically,  this  rule  corresponds  to  the  pooling  of  evidence:  if  the  belief  functions 
being  combined  are  based  on  entirely  distinct  bodies  of  evidence  and  the  set  0  discerns 
the  relevant  interaction  between  those  bodies  of  evidence,  then  the  orthogonal  sum  gives 
degrees  of  belief  that  are  appropriate  on  the  basis  of  the  combined  evidence  (Ref.  24). 

Let  m,  and  m2  be  basic  probability  assignments,  on  the  same  frame  of 
discernment  0,  for  belief  functions  Bel,  and  Bel2  respectively.  If  Bel,' s  focal  elements 
are  B,,  ...,  Bk ,  and  Bel2's  focal  elements  are  C,,  ...,  Cn ,  the  total  portion  of  belief 
exactly  committed  to  A  (A  *  0)  is  given  by  the  orthogonal  sum  m  =  m,  ©  m2 : 

m(A)  =  K  m,(B,).m2(Cj),  (4.9) 

u 

BinCj=A 

where  1/K=  1  -  £  m^B^.m^Cj)  =  g  m,(B1).m2(Cj)  .  (4.10) 

i.j  i,j 

BinCj-0  BinCj*0 


The  scalar  K  is  a  normalizing  constant.  It  normalizes  to  one  the  total  portion  of  belief 

exactly  committed  to  A  because  it  may  occur  that  a  focal  element  B1  of  Bel,  and  a  focal 
element  Cj  of  Bel2  will  be  such  that  B,  n  Cj  =  0,  and 

£  m,(Bi).mJ(Cj)  >  0. 

i.j 

B,nCj=0 
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The  recourse  to  K  is  thus  justified  by  the  need  to  compensate  for  the  measure  of  belief 
committed  to  0.  If  1/K  =  0  then  the  combined  belief  functions  are  said  to  be  totally 
contradictory  and  Bel,  ©  Bel2  does  not  exist  or,  equivalently,  Bel,  and  Bel2  are  not 


FIGURE  10  -  Orthogonal  Sum  of  Two  Basic  Probability  Assignments 

combinable.  Figure  10  shows  the  orthogonal  sum  of  two  basic  probability  assignments 
m,(B,)  and  m2(Cj);  the  bold  lines  delineate  the  total  probability  mass  committed  to 
B,  nCj. 

As  mentioned  earlier,  the  result  of  the  orthogonal  sum  is  another  basic  probability 
assignment;  the  core  of  the  belief  function  given  by  m  is  equal  to  the  intersection  of  the 
cores  of  Bel,  and  Bel2 . 

The  operation  of  orthogonal  sum  of  basic  probability  assignments  satisfies  the 
following  properties  (Ref.  39): 
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1 .  commutativity:  m,  ©  m2  =m2®m1; 

2.  associativity:  (m,  ©m2)®m3=m1  ©(m2  ©  m3). 

Therefore,  the  order  of  combination  is  immaterial  and  the  operation  allows  the  pairwise 
composition  of  a  sequence  of  basic  probability  assignments  such  that,  if  m, ,  m2 , ...,  mp 
are  p  pieces  of  evidence,  their  combination  is: 

m-  m,  © m2  ©  •••  ©  mp. 

If  nij  are  a  collection  of  basic  probability  assignments  with  focal  elements  A, ,  Bj5  Ck, 
D, , ...  respectively,  then 

m(A)  =  K  X[mi(Ai)  m^Bj)  m3(Ck)  m4(Dm)  •••] 

Ai,B|.Ck,Dm,... 

AioBjnCknDmn„.»A 


m(0)  =  0 


(4.11) 


1/K-l-  2>-(Ai)  m2 (Bj )  m3(Ck)  m4 (Dra)  •••]. 

AinBjnCknDIT1n...=0 


It  is  interesting  to  note  that  the  formation  of  orthogonal  sums  by  Dempster's  rule 
corresponds  to  the  multiplication  of  commonality  functions.  If  the  commonality 
functions  for  Bel, ,  Bel2  and  Bel,  ©  Bel2  are  denoted  by  Q, ,  Q2  and  Q,  respectively, 
then 


Q(A)  =  K  Q,(A)  Q2(A), 


(4.12) 


where  K  does  not  depend  on  A.  The  proof  of  (4.12)  is  given  in  Appendix  B  (Section  B.2). 
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It  is  therefore  possible  to  calculate  Belj  ®  Bel2  by  applying  the  following  procedure: 

a.  Find  the  plausibility  functions  P^  and  Pl2  from  (4.6): 

PI;  =1-Belj  (A). 

b.  Find  the  commonality  functions  Qj  andQ2  (from  a  transformation  of  (4. 8)): 

Q;(A)  =  ^(-l^Pl^B).  (4.13) 

c.  Find  the  appropriate  normalization  constant  K  by: 

cl.  Substituting  A  for  ©  in  (4.8)  such  that  Pl(0)=l, 

c2.  Substituting  Q(B)  for  K  Qj(B)  Q2(B)  in  (4.8), 

c3.  Evaluating  1/K>  ^ (-i)|B!+1Q1  (S)  Q2(B).  (4.14) 

0*B£0 

d.  Find  the  multiplication  of  commonality  functions  using  plausibility  (from  (4.8) 
and  (4.12)): 

Pi(A)  =  K  X('lf 'Q,(B)  Q,(B).  (4.15) 

0*BCA 

e.  Find  the  belief  from  the  resulting  plausibility  function. 

In  a  similar  fashion,  the  formulas  all  generalize  to  the  case  where  more  than  two 
belief  functions  are  combined  by  replacing  Q,(B)  Q2(B)  by  Qj(B)  ■  Qn(B).  Appendix 

B  (Section  B.3)  shows  a  numerical  example  of  Dempster's  rule  of  combination 

4.3  Comments  Concerning  Dempster' s  Rule  of  Combination 

Various  authors  have  commented  on  Dempster's  rule  of  combination  in  terms  of  its 
justification  and  normalization  inadequacies.  Some  authors  have  even  proposed 
replacements  to  his  combination  rule.  These  concerns  are  summarized  in  the  following 
subsections. 


4.3.1  Requirements  of  Dempster's  Rule  of  Combination 

Dempster’s  rule  of  combination  is  simply  a  rule  for  computing  a  belief  function 
from  two  other  belief  functions.  According  to  Shafer  (Ref!  24),  this  rule  reflects  the 
pooling  of  evidence  within  the  Dempster-Shafer  theory  provided  that  two  requirements 
are  met:  the  bodies  of  evidence  to  be  combined  have  to  be  independent  and  the  frame  of 
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discernment  must  discern  the  relevant  interaction  of  these  bodies  of  evidence.  We  will 
argue  that  within  the  framework  of  the  current  study,  Shafer’s  requirements  are  met. 

Shafer  (Ref.  24)  states  that  Dempster’s  combination  rule  seems  to  reflect  the 
pooling  of  evidence  provided  that  the  belief  functions  to  be  combined  are  actually  based 
on  entirely  distinct  bodies  of  evidence.  He  later  provides  further  insight  into  his 
independence  requirement  by  suggesting  that  the  evidences  to  be  combined  must  be 
independent  when  viewed  abstractly,  i.e.,  before  the  interactions  of  their  conclusions  are 
taken  into  account.  However,  Shafer  has  not  provided  a  formal  definition  of 
independence,  such  as  AlBo  P(A  n  B)  =  P(A)  P(B) . 

Voorbraak  (Ref.  40)  has  shown,  using  a  simple  example  (given  in  Appendix  B, 
Section  B.4),  that  even  if  the  evidences  seem  independent  according  to  Shafer’s  definition, 
the  combination  can  produce  counterintuitive  results. 

In  the  study  of  identity  declaration  fusion,  if  we  assume  that  the  information 
sources  are  independent  from  one  another  according  to  Shafer’s  definition,  then  the 
independence  requirement  becomes  much  simpler,  since  no  inference  process  is  applied  to 
the  evidence: 

. ,  _  .  ,  [X:  declaration  from  source  1  =>  A:  classification  type  1 

id  fusion  example  < 

[Y:  declaration  from  source  2  =>  A:  classification  type  1 

Here  also  Belx(A)©BelY(A)  will  be  acceptable,  provided  that  the  evidences  seem 
independent  according  to  Shafer’s  definition.  In  what  follows,  we  will  assume  that  for  the 
problem  of  identity  declaration  fusion,  Shafer’s  independence  requirement  is  met. 

The  second  topic  to  be  investigated  is  the  discernment  of  evidence.  Dempster's 
rule  should  only  be  used  if  the  frame  of  discernment  0  is  fine  enough  to  discern  all 
relevant  interaction  of  the  evidence  to  be  combined.  Two  definitions  are  in  order  to  better 
understand  the  concept  of  discernment  of  evidence. 
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Given  finite  sets  0  and  Cl,  a  mapping a>:2@  — >  2°  is  called  a  refining  if  the  sets 
co{{9})  constitute  a  disjoint  partition  of  Cl : 

l>«W=n, 

dee 


and  the  sets  co( A)  are  given  in  terms  of  the  : 

co(A)  =  [Jco({0}),  foreachAc:©. 

Be A 

Whenever  co'2&  — >  2Q  is  a  refining,  Qis  a  refinement  of  0,  and  ©  is  a  coarsening  of  Q . 

Shafer  (Ref.  24)  states  that  Dempster's  rule  of  combination  may  give  inaccurate 
results  when  applied  in  a  frame  of  discernment  that  is  too  coarse.  Indeed,  if  Si  and  S2  are 
support  functions  over  a  frame  ©  that  are  based  on  distinct  bodies  of  evidence,  and  if  Cl  is 
a  coarsening  of  0,  then  Dempster's  rule  applied  in  the  frame  0  yields  the  support 
function: 


((S1|2®)©(S2|20»|2n. 

However,  if  it  is  applied  in  the  frame  Q,  the  support  function  b  ecomes: 

((S]|2Ci)©(S2!2n))|2a. 

These  two  support  functions  may,  in  fact,  differ.  This  can  clearly  be  seen  in  the  example 
of  Section  B.5  (Appendix  B). 

4.3.2  Normalization  Inadequacies 

A  controversial  issue  in  the  Dempster-Shafer  theory  relates  to  the  normalization  of 
probabilities  and  its  role  in  Dempster's  rule  of  combination  of  evidence  (Refs.  41-44).  As 
mentioned  in  Section  4.2,  normalization  compensates  for  the  measure  of  belief  committed 
to  the  empty  set.  However,  normalization  can  lead  to  highly  counterintuitive  results 
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because  it  suppresses  an  important  aspect  of  the  evidence.  The  following  theoretical 
example,  adapted  from  Zadeh  (Ref.  42),  illustrates  this  point. 

Let  ®  be  the  frame  of  discernment  of  possible  diagnoses  for  patient  P:  (meningitis, 
brain  tumor,  concussion).  Suppose  that  the  first  observation  is:  doctor  X  diagnoses  that 
patient  P  has  either  meningitis,  with  probability  0.99,  or  brain  tumor,  with  probability  0.01. 
The  second  observation  is:  doctor  Y  diagnoses  that  patient  P  has  either  concussion,  with 
probability  0.99,  or  brain  tumor,  with  probability  0.01.  Applying  Dempster's  rule,  as 
shown  below,  leads  to  the  conclusion  that  the  belief  that  patient  P  has  brain  tumor  is  1.0 
which  is  a  very  unlikely  result. 


m2 


) 

m2  ({meningitis}) 

.0 

{meningitis} 

.0 

{} 

.0 

{} 

.0 

{meningitis} 

.0 

m2  ({brain  tumor}) 

{} 

{brain  tumor} 

{} 

{brain  tumor} 

.01 

.0099 

.0001 

.0 

.0 

m2  ({concussion}) 

{} 

{} 

{concussion} 

{concussion} 

.99 

.9801 

.0099 

.0 

.0 

m2(0) 

{meningitis} 

{brain  tumor} 

{concussion} 

© 

.0 

.0 

.0 

.0 

.0 

- 

m,  ({meningitis})  m,  ({brain  tumor})  m,  ({concussion})  m,  (0) 
.99  .01  .0  .0 


Before  Normalization  After  Normalization  H/K  =  0.000U 

m({meningitis})  =  0.0  m({meningitis})  =  0.0 

m({brain  tumor})  =  0.0001  m(  {brain  tumor})  =  1.0 

m({concussion})  =  0.0  m({concussion})  =  0.0 

m(©  )  =  0.0  m(©  )  =  0.0 

m(0  )  =  0.9999  m(0  )  =  0.0 
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Yager  (Ref.  39)  proposed  an  alternative  rule  of  combination  which  does  not 
produce  counterintuitive  results  when  there  is  conflict  between  pieces  of  evidence.  His 
combination  rule  is  denoted  by  X.  Let  m,  and  m2  be  basic  probability  assignments,  on 
the  same  frame  of  discernment  0,  for  belief  functions  Bel,  and  Bel2  respectively.  If 
Bel,'s  focal  elements  are  B,,  ...,  Bk,  and  Bel2's  focal  elements  are  C,,  ...,Cn,  the  total 
portion  of  belief  exactly  committed  to  A  (A  *  0,  0)  is  given  by  the  sum  m  =  m]l  m2  : 


m(A)  =  Yu  mi(Bi)'m2(Cj) 


BjnCj=A 


m(0)  = 


Y  mj(Bi)»m2(Cj) 


4  ^ 
VBjnCj=© 


4-  K 


m(0)  -  0 

K  =  2m1(Bi).m2(Cj) 

j 

BioCj=0 


The  fundamental  distinction  between  this  modified  combination  rule  and 
Dempster’s  original  proposal  is  that  under  the  latter,  via  the  normalization  factor,  the 
belief  K,  which  is  the  total  conflict,  is  proportionally  allocated,  to  the  focal  elements  of  m. 
Therefore  the  contradictory  portion  is  disregarded.  With  the  use  of  the  new  rule,  the 
conflicted  portion  of  the  belief  is  put  back  into  the  set  0,  as  it  regards  the  contradiction  as 
coming  from  ignorance.  Applying  Yager's  rule  to  the  diagnosis  example,  we  obtain  the 
following  results  which  seem  intuitively  more  plausible: 


Before  applying  Yager's  rule 

m({meningitis})  =  0.0 
m({brain  tumor})  —  0.0001 
m({concussion})  =  0.0 
m(©  )  =  0.0 
m(0  )  =  0.9999 


After  applying  Yager's  rule 

m({meningitis})  ==  0.0 
m({brain  tumor})  =  0.0001 
m({concussion})  =  0.0 
m(©  )  =  0.0  +  0.9999  =  0.9999 
m(0  )  =  0.0 
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Yager's  rule,  applied  to  the  example  of  Appendix  B  (Section  B.3),  produces  similar 
results  to  those  obtained  by  Dempster's  rule.  However,  Inagaki  (Ref.  45)  notes  that 
Yager's  rule  is  not  associative,  in  that  results  are  dependent  on  the  order  in  which  data  are 
received.  This  makes  Yager's  rule  considerably  less  attractive. 

Another  option  proposed  by  Yager  (Ref.  39)  is  not  to  modify  Dempster's  rule,  but 
rather  to  suggest  when  the  latter  should  be  used  in  order  to  avoid  undesired  results.  Its 
methodology  considers  the  combined  basic  probability  assignment  m  based  on  p  pieces  of 
evidence  to  be  a  good  and  informative  combination  if  it  satisfies  the  following,  rather 
subjective,  conditions: 

a.  In  formulating  m,  consider  all  the  information  available  (m1;  m2 ,  . . . , mp ). 

b.  The  information  used  to  obtain  m  must  not  be  highly  conflicting. 

c.  The  specificity  of  m  is  high. 

d.  The  entropy  of  m  is  high. 

Specificity  and  entropy  measure  the  amount  of  information  contained  in  a  basic 
probability  assignment.  Specificity,  Pm,  measures  the  degree  to  which  the  basic  probability 
numbers  are  allocated  to  focal  elements  small  in  size;  it  provides  an  indication  of  the 
dispersion  of  the  belief.  The  higher  Pm,  the  less  diverse  is  the  evidence.  If  we  assume  that 
m  is  a  belief  structure  defined  over  the  set  X  and  0  has  cardinality  n,  then: 

Pm=  ]Tm(A)/nA,  nA  =  Card  A .  (4.16) 

AcX,A*0 


This  quantity  is  characterized  by  the  following  properties  (see  proof  in  Appendix  B, 
Section  B. 6): 

a.  V  <  P  <1 

’  /  A  m  —  A  > 

b  p-=Xiffm  is  vacuous, 

c.  Pm  =  1  iff  m  is  a  Bayesian  belief  function. 
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Entropy ,  Em,  measures  the  degree  to  which  the  basic  probability  numbers  are 
allocated  in  a  consonant  manner,  that  is,  not  allocating  mass  among  disjoint  sets.  It  thus 
provides  a  measure  of  the  dissonance  of  the  evidence.  The  lower  Em ,  the  more  consistent 
is  the  evidence.  If  m  is  a  belief  structure  defined  over  the  set  X,  then  the  entropy  measure 
is  given  by: 

Em  =  ~  Z  m(A)  ln(pl(A»  =  ^  m(A)  Con(Bel,  Bel A  ),  (4. 1 7) 

AcX  AcX 

where  Con(Bel,BelA)  = -ln(l-k)  and  k=  ^m,(Aj)  m2(Bj). 

tj 

AinBj=0 


The  entropy  measure  is  characterized  by  the  following  properties  (proofs  in 
Appendix  B,  Section  B.7): 

a.  If  m  is  a  Bayesian  belief  function,  Em  reduces  to  the  Shannon  entropy  measure, 

b.  0<Em<Un(n), 

c.  Em  =  0  if  A;  n  Aj  =  0  for  each  pair  of  focal  elements, 
d-  Em  =  Infc)  iff  m(  Ai )  =  Yn  for  i=1»  2,...,n. 


Entropy  can  also  be  described  by  Hm ,  which  is  a  transformation  of  Em  : 


H„  =  e  m  =  e' 


^InlPlfA)]^ 


_  eln[Pl(A1)]mCAl)+In[Pl(A1)]m(A2)+_.  ^  fQcaJ  elementS  A 


=  Pl(A1)m(Al)  xPl(A2)mCAj)  X  ...  =  f]Pl(A)m(A)  . 


(4.18) 


Ac9 


It  can  be  shown  that 

a.  1/n  £  Hm£  1, 

b.  Hm=  1  if  A;  o  Aj  0  for  each  pair  of  focal  elements, 

c.  Hm=  1/n  iflfm{Bi }  =  1/n  for  i=l,  2,...,n. 
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Thus  the  two  measures  Pm  and  Hm  are  such  that  the  closer  they  are  to  unity,  the 
more  informative  the  evidence.  The  degree  of  informativeness  of  a  basic  probability 
assignment  m  can  be  obtained  from: 

(4.19) 

where  1/n2  <  I  <  1 . 

m 

4.4  Computational  Complexity 

One  drawback  of  the  Dempster-Shafer  Evidential  Theory  is  the  long  calculation 
time  required  by  its  high  computational  complexity.  Because  the  combination  rule 
produces  basic  probability  numbers  on  the  subsets  of  ©,  the  calculations  are  time 
exponential.  In  comparison,  the  Bayesian  approach  provides  probability  statements  on  the 
elements  of  0.  Therefore,  if  0  consists  of  4  possible  points/outcomes,  the  definition  of  a 
probability  function  on  ©  requires  the  assignment  of  probability  to  4  points,  whereas  the 
definition  of  the  Dempster-Shafer  basic  probability  assignment  requires  the  definition  of 
m(A)  for  24=16  subsets  A  of  0. 

Three  categories  of  options  are  available  to  reduce  computational  complexity.  The 
first  approximates  the  belief  function,  the  second  treats  simple  support  functions  instead  of 
belief  functions,  and  the  last  one  separates  the  frame  of  discernment  into  smaller,  more 
manageable  frames,  one  for  each  set  of  mutually  exclusive  hypotheses.  Table  II 
summarizes  the  various  options  available  for  reducing  the  computational  complexity  of 
Dempster's  combination  rule. 

Voorbraak  (Ref.  46)  has  defined  a  Bayesian  approximation  of  a  belief  function  and 
he  has  shown  that  combining  the  Bayesian  approximations  of  belief  functions  is 
computationally  less  complex  than  combining  the  belief  functions  themselves;  the 
computational  time  will  be  reduced  from  exponential  to  polynomial.  This  approach, 
however,  is  only  appealing  when  one  is  interested  in  final  conclusions  about  the  elements 
of  0;  in  the  study  of  identity  declaration  fusion,  we  are  mostly  interested  in  subsets  of  0. 

J.  Barnett  (Ref.  47)  demonstrated  that  if  each  piece  of  evidence  consists  of  simple 
support  functions  focused  on  singleton  propositions  and  their  negations,  computational 
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time  will  be  reduced  from  exponential  to  linear.  Another  option  was  proposed  by  Gordon 
&  Shortliffe  (Ref.  48).  It  is  based  on  the  assumptions  that  (i)  each  piece  of  evidence 
consists  of  simple  support  functions  focused  for  or  against  subsets  of  0  instead  of 
singletons,  and  that  (ii)  the  subsets  of  0  can  be  structured  in  a  strict  hierarchical  tree.  This 
method  builds  on  Barnett's  approach  while  permitting  hierarchical  relationships  among 
hypotheses;  its  aim  is  similar  to  that  of  Pearl  (Section  3.6.2). 

TABLE n 

Options  to  Reduce  Computational  Complexity  of  Dempster's  Rule 


Author(s) 

Technique 

Calculation 

Voorbraak  (Ref.  46) 

Bayesian  approximation  of 
belief  function 

Barnett  (Ref.  47) 

Simple  support  function 
focused  on  singleton 

Linear  time  proportional  to 

1  ©  1 

Gordon  &  Shortliffe 
(Ref.  48) 

Simple  support  function 
using  subsets  of  0,  evidence 
hierarchically  structured 

Linear  time  proportional  to 

1 0 1 

Shafer  (Ref.  49) 

Belief  functions  carried  by 
the  field  of  subsets 
generated  by  children  of 
node,  evidence 
hierarchically  structured 

Proportional  to  size  of  sibs 

Shafer  &  Logan  (Ref  3) 

Simple  support  function 
focused  on  a  subset  of  0  or 
its  complement,  and  carried 
by  the  field  of  subsets 
generated  by  children  of 
node,  evidence 
hierarchically  structured 

Linear  time  proportional  to 
number  of  nodes  in  the  tree 

However,  combining  negative  evidence  leads  to  computational  difficulties  because 
the  intersection  of  the  complements  of  nodes  may  fail  to  correspond  to  a  node  or  its 
complement.  In  such  a  case,  an  approximation  is  suggested  by  the  authors  but  this 
approximation  restricts  the  usefulness  of  the  plausibility  measure.  The  algorithm  can  be 
implemented  in  a  form  which  is  linear  in  the  number  of  nodes  in  the  tree. 
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Cleckner  (Ref.  35)  offers  a  comparative  study  in  terms  of  memory  requirements 
and  computational  complexity  in  which  the  standard  Dempster-Shafer  combination  rule  is 
compared  with  Barnett's  algorithm  and  the  alternative  due  to  Gordon  &  Shortliffe  (Ref 
47).  Cleckner  concluded  that  when  dealing  with  simple  support  functions  focused  on 
singleton  hypothesis,  Barnett’s  technique  is  the  least  computer  intensive  of  the  three. 

The  last  category  of  options,  which  also  assumes  that  evidence  is  hierarchically 
structured,  was  proposed  by  Shafer  (Ref  49).  He  suggested  that  the  belief  functions  to  be 
combined  should  be  carried  by  a  partition  P  of  0,  which  has  fewer  elements  than  0.  This 
is  done  by  separating  the  frame  of  discernment  into  smaller  more  manageable  frames,  one 
for  each  set  of  mutually  exclusive  hypotheses.  Of  course,  once  the  elements  have  been 
separated  into  multiple  frames,  items  from  different  frames  can  no  longer  be  compared 
since  they  then  pertain  to  different  belief  functions.  Shafer’s  first  approach  combines 
belief  functions  each  of  which  is  carried  by  the  field  of  subsets  generated  by  the  children  of 
a  particular  node.  His  second  approach  combines  simple  support  functions  focused  on  a 
subset  of  0  or  its  complement  (Shafer  &  Logan,  Ref.  3).  The  latter  uses  the  same  type  of 
evidence  as  considered  by  Gordon  &  Shortliffe  (Ref.  48),  while  avoiding  some  of  its 
shortcomings.  This  approach  is  detailed  in  the  following  section. 

4.5  Dempster's  Rule  for  Hierarchical  Evidence,  Revisited  by  Shafer  and  Logan 

The  Shafer  &  Logan  technique  reduces  computational  complexity  in  three  ways: 

a.  By  using  hierarchical  evidence  that  reduces  the  number  of  admissible  subsets; 

b.  By  using  simple  support  functions  focused  on  a  subset  of  0  or  its  complement; 

c.  By  reducing  Dempster's  rule  of  combination  to  a  series  of  combinations 

involving  smaller  frames  of  discernment. 

The  third  item  is  of  prime  importance.  By  reducing  the  frame  of  discernment  0 
into  smaller  frames  of  discernment,  the  complexity  is  reduced  because  the  smaller  frames 
have  less  elements  than  0.  However,  a  constraint  has  to  be  imposed  for  the  combination 
to  be  permissible  in  the  smaller  frames:  the  simple  support  functions  and  their  combination 
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to  be  permissible  in  the  smaller  frames:  the  simple  support  functions  and  their 
combination  must  be  carried  by  the  field  of  subsets  generated  by  the  children  of  a  node. 
To  understand  this  constraint,  certain  terms  will  be  defined  in  the  following  subsections. 

4.5.1  Partition  of  a  Frame  of  Discernment 

A  partition  of  a  frame  of  discernment  ©  is  a  set  of  disjoint  non-empty  subsets  of 
0  whose  union  equals  ©;  such  a  partition  P  can  itself  be  regarded  as  a  frame  of 
discernment.  P  *  represents  the  set  consisting  of  all  unions  of  elements  of  P.  For  example, 
if  ©  =  {a,  b,  c,  d}  an  dP  =  {7Tx,k2}  with^,  =  {a}  and  tt2  =  {b,c,d} ,  then  P  *  =  {0,{a}, 
{b,  c,  d>,  {a,  b,  c,  d}}. 

As  defined  in  Subsection  4.3.1,  partition  P,  is  a  refinement  of  partition  P2  if  for 
every  element  P,  in  P,  there  is  an  element  P2  in  P2  such  that  P,  c  P2 ,  For  example,  let 
P,  and  P2  be  two  partitions  of  ©;  let  also;r,  =  {a},  n2  =  {b,c,d}  and  =  {a,  b,  c,  d}  . 
If  P,  =  {kx,7i2},Px  =  {0,  {a},  {b,  c,  d},  {a,  b,  c,  d}}  andP2=  {^3},  P2*  ={0,  {a,  b,  c, 
d} },  then  P,  is  a  refinement  of  P2. 

If  all  the  subsets  of  a  belief  function  Bel  are  included  in  P\  the  belief  function  Bel 
is  then  said  to  be  carried  by  P.  For  example,  let  P  =  {kx,7V2} wither,  =  {a} and 
n2  =  {b,c,d}  and  the  belief  function  Bel  be  represented  by  the  basic  probability 
assignments: 

m({a,  b})  =  0.5 

m({a,  b,  c,  d})  =  0.5. 

The  belief  function  Bel  is  not  carried  by  P  since  the  subset  {a,b}  P*  where  the  latter  is 
defined  as  before. 

Shafer  &  Logan  have  shown  that  if  Bel,  and  Bel2  are  both  carried  by  P,  then 
Bel,  ©  Bel2  will  also  be  carried  by  P.  This  result  has  an  important  impact  on  the 
combination  of  evidence  when  using,  for  example,  commonality  functions  as  described  in 
Subsection  4.2.1.  In  effect,  this  conclusion  may  be  transposed  to  (4.13),  (4.14)  and  (4.15) 
if  it  is  assumed  that  Bel,  and  Bel2  are  both  carried  by  P  and  that  A  e  P* .  One  finds: 
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Qi(A)  =  2:(-i),BI'+lpii(B) 

BeP* 

0*B£A 

1/K  =  2>1)PI'+1Q,(B)Q2(B)  (4.20) 

0*BeP* 

P1(A)  =  K  Xf-ir'^Q.CBJQ.CB) 

BeP* 

0*B£A 

where  |B|P  denotes  the  number  of  elements  of  P  contained  in  B.  The  end  result  is  that  the 
plausibility  function  Pi  for  Bel,  ©Bel2  can  be  computed,  because  the  combination  is 
carried  by  P. 

To  say  that  Bel,  ®  Bel2  is  carried  by  P  is  equivalent  to  saying  that  P  discerns  the 
interaction  between  Bel,  and  Bel2 .  The  topic  of  discernment  of  evidence  was  introduced 
in  Subsection  4.3.1.  We  say  that  P  discerns  the  interaction  relevant  to  itself  if: 

(Bel,[2p)ffi  (Bel2|2p)  =  (Bel,  ©  Bel2)|2p 

4.5.2  General  Concepts 

Shafer  &  Logan  (Ref.  3)  have  introduced  a  new  terminology  and  notation  that  will 
greatly  facilitate  the  understanding  of  the  combination  process. 

Let  91  be  the  collection  of  all  nodes  below  0.  As  illustrated  in  Figure  11,  91  is 
represented  by  (C,  D,  E,  F,  G,  H,  I,  J,  K}.  As  in  Subsection  3.6.2,  D  is  said  to  be  the 
child  of  C,  and  C  is  D's  parent.  The  set  of  nodes  that  consists  of  all  the  children  of  a 
given  non  terminal  node  is  called  a  sib;  the  sib  ic  consists  of  the  children  of  C. 
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FIGURE  1 1  -  Hierarchical  Evidence  to  Illustrate  91  and  i c 

It  is  assumed  that  for  each  node  A  in  91,  there  is  a  single  dichotomous  belief 
function  BelA  with  dichotomy  {A, A}.  Furthermore,  BelA(A)  and  BelA(A)  are  both 
strictly  less  than  one,  but  either,  or  both,  can  be  zero. 

For  each  node  A  in  the  tree,  BelA  denotes  the  orthogonal  sum  of  BelB  for  all 
nodes  B  that  are  strictly  below  A. 

For  each  node  in 91 ,  BelA  denotes  the  orthogonal  sum  of  BelB  for  all  nodes  B  in 
91  that  are  neither  below  A  nor  equal  to  A. 

For  example,  from  Figure  1 1 : 

Belp  =  BelG  ©  BelH  ©  BelJ, 

=  BelG  ©  BelH  ©  Belj  ©  Belj  ©  BelK 

and  BelG  =  BelF  ©  Belf . 


If  we  generalize  to  0 ,  we  have 

Belg  =  BelG  ©  Belc  ©  BelG 


or,  equivalently 


Belg  =BelJ© BelF  ©Belp. 
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In  terms  of  this  nomenclature,  the  aim  of  Shafer  &  Logan's  approach  amounts  to 
calculating  the  values  of  all  the  nodes  A  e  9? : 

Belg  =  ©{BelJA  eft}, 

or,  equivalently 

Belg  =  Beli  ©  BelA  ©  Bel®  .  (4.21) 

Using  the  definitions  of  partition  of  a  frame  of  discernment  and  the  concept  of  discernment 
of  evidence  from  Subsection  4.5.1,  Shafer  &  Logan  have  shown  that: 

a.  if  P  is  a  partition  of  ©  and  P  eft  r\P,  then  (Belp)p  =  (Belp){Pp}, 

b.  if  P  is  a  partition  of  0  ,  A  eft  and  AeP  then  (BelA)P  =  (BelA){AS}, 

c.  if  P  is  a  partition  of  0  t  then  P  discerns  the  interaction  relevant  to  itself 
among  the  belief  functions  (Belp|P  eftnP}  and  {Belp|P  eftnP}. 

From  the  characteristics  of  a  partition  of  a  frame  of  discernment  and  the  definition 
of  BelA ,  we  obtain,  say,  for  a  partition  l A  u  {A} : 

(Bell  ),4UIS)  =  ®{(BelB  )JaU(5i  0  (Bel  *  )<auW  |B  £  <A  }. 

By  a.  above,  this  is  equivalent  to 

(Beli),AO|5,  =  0{BelB  ®  (Bel4B),Bj5,|B  £(?*}  .  (4.22) 

Note  that  if  the  element  B  of  lA  is  a  terminal  node,  then  Bel  £  is  vacuous  and  the 
orthogonal  sum  BelB  ©(BelB){Bg}  reduces  to  BelB.  Equation  4.22  states  that  in  order 
to  determine  the  degrees  of  belief  of  the  children  of  A  resulting  from  all  the  evidence 
bearing  on  nodes  below  A,  it  is  sufficient  to  consider  each  child  separately.  This 
represents  the  mechanism  for  passing  up  belief  values  between  different  levels  of 
hierarchy;  it  satisfies  the  intuitive  argument  to  the  effect  that  any  evidence  confirming  a 
node  should  also  provide  evidence  confirming  its  parent  (Ref.  50). 
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Generalizing  for  ©  s  we  obtain: 

(Bel*),.  =®{BdB®(Beli)wS)|Bs/e}.  (4-23) 

The  partition  becomes  instead  of  £@  u  {©},  since£@  u  {©}  c  t& . 

A  statement  even  stronger  than  statement  c.  above  is:  if  A  is  a  non  terminal 
element  of  9?,  then  the  partition  lK  u  {A}  discerns  the  interaction  relevant  to  itself 

among  BelA ,  BelA  and  BelA.  Therefore,  exploiting  (4.21)  for  Ae9?  and  the  last 

statement,  we  obtain: 

=  {(Bell  ®(BelA  ®  (Bell),  ^|A)> 

or,  equivalently: 

(B<4),A„fS)  =  {(Beli),jU(A1  ©BelA  ®(BeIl)1AA)}  (4.24) 

since  BelA  is  carried  by  £A  o{A}  and  (BelA)^u{A}  =(BelA){A^  from  statement  b. 
Equation  4.24  states  that  the  evidence  from  above  A  and  down  other  branches  affects  the 
degree  of  belief  of  the  children  of  A  only  if  the  degree  of  belief  is  for  or  against  A  itself. 
This  is  the  mechanism  for  passing,  confirming  and  disconfirming  evidence  to  the  lower 
levels  of  the  hierarchy. 

4.5.3  The  Shafer-Logan  Algorithm 

The  algorithm  proposed  by  Shafer  &  Logan  (Ref.  3)  is  based  on  the  concepts 
described  in  the  previous  subsection  and  Appendix  C.  The  combination  of  hierarchical 
evidence  is  accomplished  in  four  stages. 

At  stage  0,  evidence  is  received  for  a  specific  node  in  the  form  of  a  simple  support 
function  or  dichotomous  function.  This  evidence  is  first  combined  with  the  existing  belief 
value  associated  with  the  node  using  Dempster's  rule  of  combination.  In  this  case  the 
combination  is  easy  to  calculate,  since  the  belief  functions  being  combined  are  simple 
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support  functions  or  dichotomous  functions  focused  on  the  same  node.  The  result  of  this 
simple  combination  is  a  dichotomous  belief  function  which  is  then  propagated  through  the 
hierarchical  tree  using  the  Shafer-Logan  algorithm  stages  1  to  3  below. 

At  stage  1,  the  dichotomous  belief  functions  attached  to  terminal  nodes  are 
combined  to  find  degrees  of  belief  for  and  against  their  parents;  the  same  is  done  to  the 
parents'  parents  and  so  on,  until  there  is  a  dichotomous  belief  function  attached  to  each 
child  of  A  to  obtain  the  values  of  (BelA)^u{A} .  This  is  performed  by  (4.22).  This  stage 
calculates  degrees  of  belief  by  moving  up  the  tree. 

At  stage  2,  we  obtain  the  values  of  each  (direct)  child  of  ©  :  (Belg)^ ,  using 

(4.23). 


At  the  last  stage,  the  degree  of  belief  of  each  node  is  reevaluated  to  take  into 
account  the  influence  of  other  nodes  using  (4.24);  this  process  calculates  the  degrees  of 
belief  by  moving  back  down  the  tree.  Figure  12  illustrates  the  algorithm's  flow  chart. 

The  implementation  of  the  algorithm  is  not  straightforward.  The  combination  of 
dichotomous  belief  functions  is  performed  using  the  commonality  functions  of  (4.20).  The 
formulas  used  to  implement  the  algorithm  are  reproduced  in  Appendix  C  (Section  C.l). 

The  amount  of  arithmetic  involving  a  particular  node  depends  linearly  on  the 
number  of  daughters  of  the  node.  Furthermore,  the  computational  complexity  of  the 
algorithm  is  linear  in  the  number  of  nodes  in  the  tree.  This  is  the  case  because  the  belief 
functions  being  combined  are  simple  support  functions  focused  on  nodes  or  their 
complements.  However,  if  we  were  to  combine  belief  functions  carried  by  the  field  of 
subsets  generated  by  the  children  of  a  node,  more  precisely  by  a  partition  lA  (A)  (such 

as  suggested  by  Shafer,  Ref.  49),  then  the  amount  of  arithmetic  would  become  exponential 
in  the  sib  size  but  remain  proportional  to  the  number  of  sibs. 

Note  that  a  special  case  occurs  when  the  belief  functions  are  Bayesian,  that  is, 
when  the  belief  function  BelA  carried  by  iA  '-'{A}  satisfies: 
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[top  of  the  tree 


Stage  1 
up  the  tree 


Stage  3 
down  the  tree 


END 


FIGURE  12  -  Flowchart  for  the  Shafer-Logam  Algorithm 
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BelA  (B|  A)  +  BelA  (B|  A)  =  1 
and 

BelA  (B|  A)  +  BelA  (B|  A)  =  1 

for  every  element  B  of  the  field  (iA  u  {A})*;  the  arithmetic  is  then  linear  in  the  sib  size. 
This  case  is  specifically  the  one  devised  by  J.  Pearl  and  described  in  Subsection  3.6.2. 

The  Dempster  combination  rule  for  combining  belief  functions,  as  described  in 
Subsection  4.2.1,  was  implemented  on  an  HP  workstation  using  the  C++  language.  The 
Shafer-Logan  algorithm  for  combining  simple  support  functions  focused  on  a  subset  of  © 
or  its  complement  was  also  implemented.  Two  examples  are  provided  in  Appendices.  The 
first  illustration  (Appendix  C,  Section  C.2)  shows  that  if  the  belief  functions  are  Bayesian, 
then  Dempster's  combination  rule  gives  similar  results  to  those  obtained  by  Pearl's 
algorithm.  This  example  uses  the  strict  hierarchical  tree  illustrated  in  Figure  9  of 
Subsection  3.6.3.  The  a  priori  probabilities  are  indicated  for  each  set  of  interest.  For 
example. 


m({B})  =  m({C,  D})  =  P(B1E,)  =  0.1538. 

Here,  the  Shafer-Logan  algorim  can  m(2Unot  be  applied  since  the  a  priori  probabilities  are 
not  dichotomous  in  nature. 

The  second  example  (Appendix  C,  Section  C.3)  shows  the  propagation  effect  of 
combining  dichotomous  belief  functions  using  the  Shafer-Logan  algorithm.  This  example, 
in  which  6  sets  of  evidence  are  combined,  illustrates  the  propagation  effect  of  the  Shafer  & 
Logan  algorithm.  The  same  strict  hierarchical  tree  as  above  is  used.  The  example  is 
composed  of  6  steps,  at  which  additional  evidence  is  received  for  a  specific  node  in  the 
form  of  a  simple  support  function  or  dichotomous  function,  and  then  combined.  The  new 
belief  (Bel)  and  plausibility  (PI)  values  are  calculated  for  each  node.  Figures  13  to  18  show 
the  results  of  the  Shafer-Logan  algorithm  after  adding  evidence  from  step  1  to  6 
respectively. 


P499630.PDF  [Page:  69  of  122] 


UNCLASSIFIED 

58 


FIGURE  13  -  Results  From  the  Shafer-Logan  Algorithm  -  Step  1 


FIGURE  14  -  Results  From  the  Shafer-Logan  Algorithm  -  Step  2 
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A  plausibility  of  1  for  {N}  implies  that  so  far  no  evidence  can  refute  {N}  whereas  a 
plausibility  of  .5  for  {L}  (for  example)  is  obtained  through  evidence  m(0  )  =  .5. 

The  evidence  m({H})  =  .5  does  not  influence  the  children  of  (H)  (in  terms  of  belief  and 
plausibility)  but  influences  the  belief  of  the  father  of  (H)  and  the  plausibility  of  the  other 
nodes. 


FIGURE  15  -  Results  From  the  Shafer-Logan  Algorithm  -  Step  3 

In  a  similar  fashion,  a  strong  evidence  for  {F}  does  not  affect  the  belief  of  (B);  however, 
it  diminishes  the  plausibility  of  {B}. 
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As  predicted,  the  evidence  against  {F}  (m({F})  =  .5)  affects  the  children  of  {F}. 


FIGURE  17  -  Results  From  the  Shafer-Logan  Algorithm  -  Step  5 
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The  contradictory  evidences  greatly  modifies  the  belief  and  plausibility  of  {B}  and 
(F),  when  combining  the  two  dichotomous  belief  functions  focused  on  the  same  nodes  B 
and  F  (m({B})  =  .99,  m({F})  =  .01  with  m({B})  =  .02,  m({F})  =  .95,  m(0)  =  .03). 
Dempster’s  rule  calculates  a  small  normalizing  constant  (K  =  .06),  indicating  conflict.  It  is 
interesting  to  note  that  this  evidence  is  dichotomous  since  {B}  =  (F) . 


The  small  evidence  for  (B)  slightly  modifies  the  belief  and  plausibility  of  {B};  the  big 
uncertainty  is  distributed  amongst  the  other  nodes. 
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5.0  STATISTICAL  DECISION  MAKING 

An  important  element  to  take  into  consideration  in  the  design  of  an  identity 
declaration  fusion  function  is  the  decision  making  process  required  to  select  the  identity 
declaration  which  best  supports  the  combined  declarations  (Barnett,  Ref.  25).  Statistical 
decision  making  is  necessary  since,  after  fusion,  the  resulting  hierarchical  structure  may 
contain  many  identity  declarations  with  a  non-null  confidence  value.  Because  decisions 
are  subjective,  the  decision  maker  will  undoubtedly  depend  on  his  or  her  own  judgment  as 
well  as  information  collected  from  various  sources.  The  fusion  function  should  still 
suggest  to  the  decision  maker  the  "best"  candidate  or  candidates  according  to 
predetermined  decision  rules.  A  number  of  factors  are  involved  in  the  choice  of  the 
decision  rules  to  be  used  (Nahim  &  Pokoski,  Ref.  18)  : 

-  rule  complexity  (if  the  computational  time  required  to  make  a  decision  is  too 
high,  the  process  becomes  useless), 

-  confidence  level  (decision  rules  are  made  on  a  probabilistic  basis), 

-  fusion  technique  used  to  combine  uncertain  information  (for  example  the 
Dempster- Shafer  approach  produces  belief  and  plausibility  values  instead  of 
single  probability  values), 

-  type  of  application  (the  application  environment  can  dictate  decision  rules). 

Before  analyzing  the  specific  needs  of  an  identity  declaration  fusion  function, 
various  approaches  will  be  discussed  concerning  decision  techniques  in  the  face  of 
knowledge  combined  by  the  Dempster-Shafer  theory.  Then,  decision  making  will  be 
studied  pertaining  to  information  structured  in  a  hierarchical  manner,  in  order  to  provide  a 
decision  making  approach  to  the  identity  declaration  fusion  function. 


5.1  Statistical  Decision  Making  Based  on  the  Dempster-Shafer  Representation 

The  Dempster-Shafer  theory  has  been  applied  in  various  contexts.  However,  no 
general  method  is  acknowledged  for  classification  (decision  making)  based  on  basic 
probability  assignments  (Liu  &  Yang,  Ref.  51).  It  is  important  to  note  that  the  Dempster- 


P499630.PDF  [Page:  74  of  122] 


UNCLASSIFIED 

63 


Shafer  belief  calculus  provides  two  measures  for  decision  making:  the  belief  (Bel)  and 
plausibility  (PI)  measures.  Different  approaches  have  been  studied  based  on  Bel  and/or 
PI;  these  will  be  briefly  discussed. 

Selzer  &  Gutfinger  (Ref.  52)  have  proposed  a  method  based  on  the  belief  measure 
Bel  accompanied  by  heuristic  rules  to  choose  the  best  alternative  among  many  possible 
alternatives: 

-  the  best  alternative  must  have  the  maximum  basic  probability  number, 

-  the  difference  in  basic  probability  numbers  between  any  two  alternatives 

must  be  above  a  specified  threshold. 

For  their  classification  problem,  Liu  &  Yang  (Ref.  51)  also  proposed  a  method 
based  on  the  belief  measure  Bel,  but  added  more  rules  for  selecting  the  best  alternative: 

-  the  best  alternative  must  have  the  maximum  basic  probability  number, 

-  the  difference  between  the  basic  probability  number  of  the  best 

alternative  and  the  other  alternatives  should  be  larger  than  a  threshold, 

-  the  basic  probability  number  for  uncertainty  m(0) ,  should  be  less  than  a 
certain  threshold, 

-  the  basic  probability  number  for  the  best  alternative  should  be  larger  than 
the  basic  probability  number  for  uncertainty  m(0) . 

If  the  constraints  are  not  met,  the  method  does  not  propose  a  best  alternative. 

In  the  Paramax  (Ref.  7)  study,  a  modified  Dempster-Shafer  approach  was  used  to 
fuse  primarily  attribute  information  in  order  to  obtain  identity  classification.  The 
decision  rule  proposed  was  based  on  the  maximum  basic  probability  number  of  an 
alternative  chosen  among  certain  important  alternatives  which  were  of  tactical  or  strategic 
interest.  Therefore,  not  all  alternatives  were  candidates  for  being  the  best  alternative;  the 
subset  of  candidates  was  selected  according  to  the  scenario  and  mission  at  hand. 

Voorbraak  (Ref.  46)  suggested  the  use  of  the  belief  interval  [Bel({a}),  Pl({a})]  for 
decision  making.  As  there  is  no  unique  way  of  ordering  the  belief  intervals  with  respect  to 
their  degree  of  certainty,  he  proposed  four  orderings  induced  by  the  following  rules: 
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1.  The  minimal  ordering  is  defined  by  [x,y]  [x’,y’]  iff  y<x’ 

2.  The  ordering  by  average  <av  is  defined  by  [x,y]  <>v  [x’,y5]  iff 
(x+y)/2<  (x’+y’)/2 

3.  The  belief  ordering  <Bcl  is  defined  by  [x,y]  <M  [x’,y’]  iff  x<x’ 

4.  The  plausibility  ordering  <P1  is  defined  by  [x,y]  <p]  [x’,y’]  iff  y<y’ 

The  choice  for  minimal  ordering  corresponds  to  a  rather  cautious  approach  to  the  ordering 
of  elements  with  respect  to  their  certainty,  whereas  the  choice  for  the  ordering  by  average 
appears  rather  audacious  in  nature.  The  belief  ordering  simply  corresponds  to  the 
maximum  basic  probability  number.  The  plausibility  ordering  can  play  an  important  role 
since  it  indicates  the  extent  to  which  the  belief  may  vary.  For  example, 

[Bel({a»,  Pl({a»]  =  [0.5,  0.6]  indicates  that  prob({a})  can  vary  between  0.5 

and  0.6,  but 

[Bel({b}),  Pl({b})]  =  [0.4,  0.8]  indicates  that  prob({b})  can  vary  between  0.4 

and  0.8. 

Therefore,  even  if  Bel({a})  >  Bel({b}),  the  evidence  suggests  that  prob({a})  cannot  be 
higher  that  0.6  whereas  prob({b})  could  be  as  high  as  0.8. 

Voorbraak  (Ref.  46)  does  not  favor  one  ordering  over  the  other  but  mentions  that 
the  plausibility  ordering  can  play  a  dominant  role  in  the  decision  making  process. 
Therefore  in  many  situations,  if  using  the  belief  ordering,  he  suggests  that  the  plausibility 
of  the  best  alternative  be  higher  than  the  plausibility  of  all  the  other  alternatives. 

Barnett  (Ref.  25)  also  adheres  to  the  idea  that  the  measures  Bel  and  PI  should  be 
used  to  assist  decision  making.  However,  he  concentrated  his  own  efforts  on  problems 
where  most  elements  of  0  have  basic  probability  numbers  equal  to  0.  This  occurs  when  a 
large  number  of  evidence  sources  are  not  available.  In  such  a  case,  he  argues  that  PI 
generally  provides  some  discrimination  even  when  the  evidence  is  sparse.  Therefore,  he 
suggests  that  PI  is  a  more  robust  guide  to  decision  making  than  is  Bel.  This  concept  was 
applied  by  Altoft  (Ref.  53)  to  a  classification  problem  in  which  his  main  decision  criterion 
was  to  choose  the  alternative  with  the  highest  plausibility  value,  reserving  the  belief  value 
for  tie  breaking. 
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5.2  Statistical  Decision  Making  Based  on  the  Dempster-Shafer  Representation  using 
a  Hierarchical  Structure 

When  dealing  with  a  hierarchical  structure,  the  decision  making  techniques  of  the 
previous  section  cannot  be  directly  applied  because  the  belief  of  a  parent  will  always  be 
equal  to,  or  higher  than,  the  belief  of  each  of  his  children.  Therefore,  one  cannot  simply 
rely  on  the  maximum  value  of  belief.  For  example,  in  Figure  18,  the  belief  of  {F}  is 
always  higher  than  the  belief  of  {G},  {H}  or  {I}.  Similarly,  the  belief  of  {H}  is  higher 
than  that  of  {J}  or  {K}.  Also,  the  plausibility  of  a  parent  will  always  be  equal  to,  or 
higher  than,  the  plausibility  of  each  of  his  children.  An  exception  to  this  rule  would  be  if 
the  decision  maker  were  interested  only  in  the  leaf  nodes  (the  elements  of  the  frame  of 
discernment  0  ),  in  which  case  the  hierarchical  structure  would  be  superfluous. 

Furthermore,  an  important  aspect  that  should  not  be  forgotten  in  military 
applications,  as  already  mentioned  by  Liu  &  Yang  (Ref.  51)  and  Paramax  (Ref.  7),  is  the 
fact  that  decision  making  is  scenario  and  mission  dependent. 

What  we  propose,  therefore,  is  a  semi-automated  approach  based  on  the  belief  and 
plausibility  values.  It  is  called  semi-automated  because  threshold  values  will  be  applied  to 
the  belief  and  plausibility  measures.  However,  the  final  decision  will  be  taken  by  the 
decision  maker,  because  he/she  remains  an  important  part  of  the  process  and  because  the 
choice  of  the  final  identity  is  typically  scenario  and  mission  dependent.  The  decision 
making  approach  proposed  is  as  follows: 

-  select  all  alternatives  with  a  plausibility  value  greater  than  a  certain 
threshold  TP1 ; 

-  plot  the  chosen  alternatives  according  to  hierarchical  structure  and 
indicate  for  each  node  its  belief  value; 

-  add  to  the  graph  all  the  nodes  directly  below  0  with  their  belief  values. 

Based  on  the  graph  constructed  by  this  decision  approach,  the  decision  maker  can 
select  the  best  alternative  according  to  the  highest  belief  value,  or  the  hierarchical  level  of 
interest  or  any  other  criteria.  The  plausibility  value  is  not  included  in  the  graph  as  it  is  not 
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deemed  to  be  essential  to  the  decision  making  process.  To  illustrate  the  approach,  the 
decision  technique  will  be  applied  to  the  example  of  Section  C.3  (Appendix  C). 

The  plausibility  threshold  TP1  is  chosen  equal  to  .6.  At  each  step,  the  decision 
technique  creates  a  graph  from  which  the  decision  maker  can  select  the  best  alternative 
according  to  his/her  needs.  If  no  nodes  other  than  the  direct  children  of  0  appear  in  the 
graph,  then  no  alternative  has  a  plausibility  value  higher  than  the  threshold  TP1  and  no 

decision  can  be  taken.  The  results  are  as  follows: 
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STEP  6: 


As  indicated  in  Chapter  4,  the  contradictory  evidence  at  step  5  greatly  modifies  the 
belief  of  {B}  and  {F}.  The  decision  technique  reproduces  the  effect  of  this  contradictory 
evidence. 
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6.0  IDENTITY  DECLARATION  FUSION  FUNCTION 

We  are  now  able  to  provide  an  identity  declaration  fusion  function  based  on  the 
various  concepts  studied  in  the  previous  chapters. 

The  first  section  describes  the  identity  declaration  fusion  function  and  the  second 
section  provides  an  example  using  the  identity  declaration  fusion  function  in  which  identity 
declarations  will  be  fused  and  the  decision  making  process  applied. 

6.1  Description  of  the  Identity  Declaration  Fusion  Function 

Within  the  framework  of  our  identity  information  fusion  study,  various  hypotheses 
were  made  and  delineated  in  the  previous  chapters;  the  two  hypotheses  which  bear  the 
most  impact  on  the  choice  of  the  fusion  approach  are  as  follows: 

1 .  The  evidence  provided  by  the  various  information  sources  are  independent 
according  to  Shafer’s  definition. 

2.  Probabilistic  information  is  only  available  for  some  of  the  events  associated 
with  0,  such  that  the  likelihood  matrix  is  not  fully  specified. 

These  hypotheses  suggest  that  the  Dempster-Shafer  theory  of  evidence  is  an 
appropriate  technique  to  fuse  uncertain  information.  Because  we  have  chosen  to  represent 
identity  declarations  in  a  hierarchical  manner,  the  algorithm  proposed  by  Shafer  and  Logan 
is  appealing,  both  in  terms  of  the  information  structure  and  computational  requirements. 

An  interesting  issue  which  distinguishes  this  study  from  other  studies  on  identity 
fusion  is  the  fact  that  2  different  frames  of  discernment  are  introduced  to  estimate  the 
identity  of  the  observed  objects.  The  first  frame  of  discernment  is  the  hierarchical  tree  of 
surface  and  air  classifications,  as  shown  in  Figures  5a  and  5b,  and  the  second  one  deals 
with  the  threat  categories  as  described  in  Figure  19.  The  purpose  of  this  approach  is  to 
circumvent  the  problem  whereby  the  threat  category  of  a  detected  object  is  directly 
inferred  from  its  identity.  For  example,  in  Figure  19  the  Exocet  missile  is  automatically 
assumed  friendly.  During  the  Falklands  war,  however,  the  Exocet  missile  was  definitely 
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not  considered  friendly  to  the  British  Navy.  By  eliminating  false  automated  inferences,  this 
approach  allows  more  freedom  in  the  decision  making. 


FIGURE  19  -  Example  of  Frame  of  Discernment  Where  Threat  Category  is  Directly 

Inferred  From  Identity 

To  accommodate  a  hierarchical  structure,  Figure  6  must  be  modified,  as  shown  in 
Figure  20,  to  include  ©2.  The  "pending"  subdivision  was  eliminated  since  an  uncertainty 
value  for  this  element  would  not  be  available.  Also,  the  "suspect"  and  "assumed  friend" 
subdivisions  had  to  be  eliminated  because  they  do  not  form  a  set  of  mutually  exclusive 
events  with  the  "hostile"  and  "friend"  subdivisions  (according  to  the  definition  of  a  frame 
of  discernment).  In  our  opinion,  the  two  frames  of  discernment  ©,  and  ©2  of  Figure  20 

could  encompass  most  of  the  identity  declarations  within  the  naval  environment. 

A  general  fusion  approach  is  now  proposed  based  on  the  concepts  discussed  in  the 
previous  chapters. 
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FIGURE  20  -  The  Two  Frames  of  Discernment  Used  in  the  Identity  Declaration  Fusion 

Function 


The  most  basic  concept  to  which  the  study  adheres  is  the  fact  that  information 
sources  are  self  contained  and  that  each  one  represents  a  local  decision  node  capable  of 
identifying  a  detected  object  (Section  2.2).  The  aim  is,  therefore,  to  combine  local 
decisions  in  the  hope  of  obtaining  the  correct  identity  with  a  high  probability.  Dasarathy 
(Ref.  54)  calls  this  type  of  fusion  “Decision  In-Decision  Out  Fusion”  because  both  the 
input  and  output  are  decisions.  An  appropriate  architecture  to  delineate  this  concept  is 
sensor  level  architecture  (Subsection  2.3.2.1).  Figure  21  reproduces  a  simplified  version 
of  sensor  level  architecture  from  Figure  7. 

In  the  case  of  multiple  objects  in  the  detection  environment,  the  association 
process  matches  the  received  identity  declaration,  originating  from  an  information  source, 
with  one  of  the  observed  objects.  Identity  declaration  is  one  of  many  components  that 
characterize  a  detected  object;  these  components  form  the  state  vector  of  the  object  and 
are  called  a  track.  Each  detected  object  is  typified  by  its  track.  In  the  event  that  an 
information  source  provides  identity  declaration  for  an  existing  track,  the  identity 
declaration  fusion  function  becomes  necessary  to  combine  these  declarations. 
Consequently,  the  fusion  process  is  applied  to  each  track  based  on  the  frame  of 
discernment  (@l3  ©2  or  both)  according  to  the  type  of  identity  declarations  to  combine. 

For  each  track,  the  frames  of  discernment  are  identical;  however,  the  belief  and  plausibility 
values  vary  according  to  the  weight  of  the  evidence  received. 
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FIGURE  21  -  Identity  Declaration  Fusion  Process  -  Sensor  Level  Architecture 

It  is  noteworthy  that  the  order  in  which  bodies  of  evidence  are  received  is 
inconsequential  since  the  orthogonal  sum  of  the  Dempster-Shafer  technique  is 
commutative  and  associative.  Therefore,  if  two  information  sources  simultaneously 
provide  identity  declarations  on  the  same  observed  object,  the  combination  of  these 
evidences  with  the  existing  information  will  result  in  identical  belief  and  plausibility  values, 
whether  one  evidence  is  added  before  the  other. 

If  we  assume  that  the  association  function  performs  perfectly,  the  fusion  function 
can  be  illustrated  by  the  flowchart  of  Figure  22.  Its  major  processes  are  outlined  below. 

1.  As  described  in  Subsection  2.3. 2.2,  it  was  assumed  that  all  sources  capable  of 
providing  identity  declarations  will  do  so  by  attaching  to  each  declaration  a 
quantitative  measure  of  uncertainty.  This  measure  corresponds  to  the  probability 
that  the  identity  declaration  and  detected  object  are  matched  or,  equivalently,  to 
the  probability  that  declaration  i  from  source  s  is  true: 

=  P(declaration  i  from  source  s  matches  detected  object) 

=  P(declaration  i  from  source  s  is  true). 

In  the  case  of  non-sensor  information  sources,  the  matching  coefficient  Cs ;  simply 
typifies  a  subjective  confidence  appraisal  of  the  declaration. 
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2.  It  is  further  assumed  that  information  sources  provide  single  identity  declarations 
as  opposed  to  multiple  declarations: 

example  of  single  identity  declaration:  military  fixed  wing,  Csi  =  0.7 

example  of  multiple  identity  declaration:  military  fixed  wing,  C5>i  =  0.4 

and  civil  fixed  wing,  Csi-  0.1. 

Thus,  it  is  simple  to  transform  the  probability  of  a  true  declaration  into  a 
dichotomous  or  simple  support  function.  Effectively,  if  the  single  declaration  is 
military  fixed  wing  with  Csi=  0.7,  then  we  obtain  the  following  simple  support 
function: 

m(military  fixed  wing)  =  0.7 
m(©  )  =  0.3. 

If  the  single  declaration  is  military  fixed  wing  withCsi  =  0.7,  not  military  fixed 
wing  with  Cs  i  =0.1,  then  we  obtain  the  following  dichotomous  belief  function: 

m(military  fixed  wing)  =  0.7 
m(military  fixed  wing)  =  0.1 
m(0)  =  0.2. 

3.  The  dichotomous  or  simple  support  function  is  then  combined  with  the  belief 
value,  or  more  precisely  the  dichotomous  belief  function  of  the  same  focal  element 
within  the  hierarchical  tree;  this  is  accomplished  using  Dempster's  combination  rule 
as  explained  at  stage  0  of  Subsection  4.5.3.  However,  as  shown  in  Subsection 
4.3.2, 
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Evidence:  identity  declaration  + 
matching  coefficient  of  source  s 
with  declaration  i  (C^) 


Estimated  identity  of 
observed  object 


FIGURE  22  -  Flowchart  of  Identity  Declaration  Fusion  Function 

the  normalization  inadequacies  of  Dempster’s  combination  rule  can  pose  a  serious 
problem.  Unfortunately,  Yager's  degree  of  informativeness  cannot  be  applied  since 
we  are  dealing  with  simple  support  or  dichotomous  functions.  We  could, 
however,  easily  determine  the  degree  of  conflict  when  combining  simple  support  or 
dichotomous  functions;  this  can  be  accomplished  at  stage  0,  as  described  in 
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Subsection  4.5.3.  Therefore,  if  the  normalizing  constant  K  is  greater  than  an 
appropriate  threshold  value,  the  information  is  said  to  be  in  relative  agreement  and 
Dempster's  combination  rule  can  be  applied.  If  the  normalizing  constant  is  smaller 
than  the  threshold,  the  bodies  of  evidence  are  conflicting,  suggesting  that  one  of 
the  assessments  is  unreliable,  signaling  a  potential  information  source  problem 
(Abdulghafour  &  Abidi,  Ref.  55).  It  is,  therefore,  suggested  that  the  combination 
be  suspended  pending  verification  of  the  information  sources. 

If  the  combination  is  valid,  the  result  of  the  combination  is  propagated  through  the 
hierarchical  tree  using  the  Shafer-Logan  algorithm. 

4.  The  last  phase  of  the  fusion  function  performs  the  semi-automated  decision  making 
process  in  the  sense  that  the  decision  maker  must  select  the  best  alternative  among 
a  limited  subset  of  likely  identity  declarations. 

6.2  Example  of  the  Fusion  Function  Applied  to  the  Problem  of  Identity  Declarations 

The  following  example  illustrates  the  use  of  the  Shafer-Logan  algorithm  for  the  specific 
problem  of  combining  identity  declarations  using  two  frames  of  discernment.  The  first 
frame  of  discernment  ©i  is  detailed  in  Figure  23.  The  hierarchy  has  been  simplified  from 
that  of  Figures  5a  and  5b.  The  names  of  the  leaf  nodes  (for  example  MiG-19)  were  taken 
from  Refs.  56-57  and  are  both  friendly  and  hostile  elements.  The  second  frame  of 
discernment  02  is  exactly  as  described  in  Figure  20;  in  other  words  the  leaf  nodes  are 
those  of  Figure  20. 

The  scenario  is  as  follows:  the  commander  of  a  Canadian  Patrol  Frigate-type  ship 
receives  a  series  of  identity  declarations  concerning  one  object;  he/she  must  determine  the 
identity  of  the  object  and  take  action. 

As  in  the  previous  examples,  the  belief  and  plausibility  values  at  each  node  of  the 
two  hierarchical  trees  are  zero.  The  evidences  received  from  various  information  sources 
are  given  in  Table  III  (assuming  that  the  order  of  the  received  evidences  is  unimportant): 
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TABLE m 

Evidences  received  from  various  sources 


Frame  of  discernment  ©, 

Frame  of  discernment  ©2 

m({fighter})  =  0.4 
m(©  )  =  0.6 
m({  carrier  })  =  0.7 
m(©  )  =  0.3 
m({fixed-wing})  =  0.3 
m(©  )  =  0.7 

m({  non  -  combatant })  =  0.8 

m(©  )  =  0.2 

m({air})  =  0.5 

m(©  )  =  0.5 

m({  air  })  =  0.1 

m(©  )  =  0.9 

m(  {MiG-25})  =  0.1 

m(©  )  =  0.9 

m({  helicopter})  =  0.5 

m(©  )  =  0.5 

m({MiG-19})  =  0.6 

m(©  )  =  0.4 

m({  missile  })  =  0.7 

m(©  )  =  0.3 

m({MiG-25})  =  0.4 

m(©  )  =  0.6 

m({unknown})  =  0.2 
m(©  )  =  0.8 
m({  friend  })  =  0.6 
m(©  )  =  0.4 
m({hostile})  =  0.7 
m(©  )  =  0.3 
m({neutral})  =  0.3 
m(©  )  =  0.7 
m({hostile})  =  0.8 
m(0  )  =  0.2 

The  evidences  are  the  only  ones  received  concerning  the  object;  a  decision  could  be  taken 
after  each  evidence  if  the  plausibility  of  at  least  one  node  is  greater  than  TP1 .  However, 

we  have  chosen  to  combine  all  the  evidences  before  the  decision  making  process.  In  this 
example,  TP1  will  be  set  to  .5.  Evidences  m({air})  =  .5  and  m({air})  =  .1  are 
contradictory  but  not  enough  for  the  Dempster’s  combination  rule  to  produce  irregular 
results.  Figures  24  and  25  show  the  belief  and  plausibility  measures  for  each  node  after  the 
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FIGURE  23  -  Frame  of  Discernment  @j  Used  in  Example 
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FIGURE  24  -  Results  From  Combining  Evidences  Using  0, 
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FIGURE  25  -  Results  From  Combining  Evidences  Using  02 
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FIGURE  26  -  Graph  Representing  Best  Alternatives  Using  ©j 


FIGURE  27  -  Graph  Representing  Best  Alternatives  Using  ©2 

combination  of  evidences  for  frames  of  discernment  ©!  and  ©2,  respectively.  Figures 
26  and  27  are  the  graphs  available  to  the  decision  maker.  It  seems  that  the  object  is 
airborne  and  hostile  .and  there  is  a  fairly  good  chance  that  it  is  a  fixed  wing  fighter. 
According  to  the  mission,  the  decision  maker  will  choose  the  best  alternative  and  take 
appropriate  action  if  necessary. 
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7.0  CONCLUSION 

This  document  is  concerned  with  the  use  of  the  Dempster-Shafer  theory  of 
evidence  for  the  fusion  of  identity  declarations  within  a  naval  environment.  It  proposes 
to  hierarchically  structure  the  identity  declarations  according  to  NATO’s  STANAG  4420 
charts,  which  provide  a  better  base  for  achieving  interoperability  in  information  exchange 
between  nations  than  uncontrolled  alternatives. 

The  Bayesian  approach  is  also  investigated  but  is  found  to  suffer  from  major 
deficiencies  in  a  hierarchical  context,  when  fully  specified  likelihoods  are  not  available. 
Other  problems  associated  with  this  approach  are  the  coding  of  ignorance,  and  the  strict 
requirements  on  the  belief  of  a  hypothesis  and  its  negation. 

One  drawback  of  the  Dempster-Shafer  evidential  theory  is  the  long  calculation 
time  required  by  its  high  computational  complexity.  Due  to  the  hierarchical  nature  of  the 
evidence,  an  algorithm  proposed  by  Shafer  &  Logan  (1987)  is  implemented  which 
reduces  the  calculations  from  exponential  to  linear  time,  proportional  to  the  number  of 
nodes  in  the  tree. 

A  semi-automated  decision  making  technique,  based  on  belief  and  plausibility 
values,  is  then  described  for  selecting  alternatives  which  best  support  the  combined 
identity  declarations.  The  final  decision  will  be  taken  by  the  decision  maker,  because 
he/she  remains  an  important  part  of  the  process  and  because  the  choice  of  the  final 
identity  is  typically  scenario  and  mission  dependent. 

The  use  of  the  Dempster-Shafer  theory  of  evidence  in  this  document  shows  it  to 
be  a  logical  method  of  combining  data  from  various  sources  to  help  the  commander  carry 
out  his/her  duties.  However,  the  flexibility  of  the  approach  should  not  hide  its 
shortcomings.  For  example,  the  normalization  constant  from  Dempster’s  combination 
rule  may  give  inaccurate  results,  and  the  independence  requirements  may  sometimes  be 
difficult  to  prove.  Also,  the  final  frame  of  discernment  for  threat  categories  (®  2  )>  which 
had  to  be  simplified  due  to  the  hierarchical  constraints  of  the  Dempster-Shafer  technique, 

may  not  be  sufficiently  detailed  for  the  needs  of  the  commander.  Lastly,  because  no 
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general  method  for  decision  making  from  hierarchical  evidence  is  acknowledged  in  the 
literature,  simple  heuristic  methods,  such  as  the  one  proposed  in  Chapter  5,  are  usually 
applied. 

This  document  presents  initial  results  of  investigations  on  the  use  of  the 
Dempster-Shafer  approach  in  the  naval  environment.  Nevertheless,  the  results  show  that 
the  various  concepts  studied  could  be  applicable  to  the  domain  of  wide  area  fusion  within 
the  framework  of  a  Communications,  Command,  Control  and  Intelligence  (C3I)  system. 
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APPENDIX  A 

Examples  of  the  Bayesian  Approach 
A.1  Examples  of  the  Bayesian  Approach  (general  case) 

To  better  appreciate  the  Bayesian  approach  to  uncertainty  in  terms  of 
representation  and  combination  of  information,  two  simple  applications  are  given. 

The  general  context  is  the  following:  two  possible  missile  types  (type  1  and  type  2) 
are  known  to  be  in  the  coverage  area  of  two  independent  sensors.  The  first  application 
deals  with  2  successive  identity  declarations  by  a  single  sensor  whereby  the  two 
declarations  are  missile  type  1.  Because  these  declarations  are  assumed  to  be 
conditionally  independent,  they  can  be  fused  using  (3.6).  If  the  likelihood  matrix  is  given 
by 


PCE.jH;) 


^0.8  0.4n 

<0.2  0.6, 


and  if  the  a  priori  probabilities  P(Hj )  are  considered  equal  to  1/2,  then  the  probability  that 
each  type  of  missile  is  present  after  the  first  declaration  is: 


P(typel  !  E^el 

P(type2|  E^el 


0.8  x  0,5 

0.8  x  0.5  +  0.4  x  0.5 


2 

3 


After  the  second  declaration,  we  obtain  the  following  a  posteriori  probabilities: 

P(typel  |  E^ , )  =.8 
P(type2|  E^el)=.2 


In  the  second  application,  two  sensors  (A  and  B)  are  capable  of  declaring  the 
identity  of  targets  and  each  sensor  declares  concurrently  the  missile  to  be  of  type  1.  Let  us 
assume  that  the  sensors  are  identical  such  that  the  likelihood  matrix  of  each  sensor  is  given 
by: 
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PCEJHj) 


0.8  0.4 

0.2  0.6. 


Let  us  assume  also  that  the  a  priori  probabilities  P(Hj )  are  equal  and  that  the  sensors  are 

independent,  so  that  the  joint  probabilities  are  simply  the  product  of  their  individual 
probabilities.  The  resulting  likelihood  matrix  becomes: 


0.8  x  0.8  0.4  x  0.4 

0.8  x  0.2  0.4  x  0.6 

0.2  x  0.8  0.6  x  0.4 

.0.2  x  0.2  0.6  x  0.6, 


'0.64 

0.16N 

0.16 

0.24 

0.16 

0.24 

J 

^0.04 

0.36/ 

where  0.64  =  P(evidence  from  sensor  A,  evidence  from  sensor  B  |  missile  type  1) 

=  P(evidence  from  sensor  A  |  missile  type  1)  x  P(evidence  from  sensor  B  | 
missile  type  1) 

=  0.8  x  0.8 

since  evidences  are  conditionally  independent. 

The  probability  that  each  type  of  missile  is  present  after  both  concurrent  evidences  is: 

P(missile  type  1  |  Evidence  sensor  A,  Evidence  sensor  B)  = 

0-64x0,5  _QS 

0.64  x  0.5  +  0.16  x  0.5 

P(missile  type  2  |  Evidence  sensor  A,  Evidence  sensor  B)  =  0.2 

The  fact  that  the  same  results  are  obtained  in  the  two  examples  is  not  surprising 
since,  in  both  cases,  the  successive  evidences  are  assumed  to  be  conditionally  independent. 
Whether  these  evidences  come  from  2  different  sources,  or  from  the  same  source  at  two 
different  points  in  time,  makes  no  difference  in  the  analysis. 
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A.2  Example  of  the  Technique  Suggested  by  J.  Pearl 

Figure  9  provides  an  example  of  a  strict  hierarchical  tree  of  hypotheses:  O  =  (C, 
D,  I,  K,  L,  M,  N,  O,  P}.  As  before,  other  letters  are  used  to  represent  unions  of  these 
outcomes,  e.g.  B  =  {C,  D),  G  =  {L,  M}  and  H  =  {K,  N,  O,  P).  A  priori  probabilities  are 
indicated  for  each  set  of  interest.  For  example,  P(G)  =  P(L)  +  P(M)  =  0.3  +  0.2  =  0.5  and 
similarly  P(H)  =  0.2.  Now  suppose  that  information  is  received  concerning  hypothesis  set 
H  in  such  a  way  that 


P(Ej  |  H)  =  0.5  and  P(E1  |  H)  =  0.2 


This  information  can  be  represented  differently  by  the  likelihood  matrix: 


PCEJHj) 


'0.5  0.2^ 
^0.5  0.8/ 


where  H,  =  H,  H2  =  H  and  E2  =  E. 

If  Ej  is  observed,  then: 

0(H)  =  0.2  /  0.8  =  1/4 
Ah  =0.5/  0.2  =  2.5 

<4=1/  [(2.5  x  0.2)  +  1  -  0.2]  =  0.769 
P(H  |  Ej)  =  0.769  x  2.5  x  0.2  =  0.3845 


and  hence  P(H  |  Ej)  represents  the  posterior  probability  of  H  given  E,,  indicated  in 
parentheses  beside  the  prior  probability  of  H,  in  Figure  9.  To  determine  how  this  new 
evidence  affects  H’s  parents  and  children,  we  use  Pearl’s  formulas.  For  J,  which  is  a  child 
of  H,  we  find: 

P(J|Ej)  =  0.1  x  0.769  x  2.5=0.19225  (Child  of  H) 

For  F,  which  is  a  parent  of  H,  we  find: 


P(F  |  E,)  =  0.769  (0.8  -  0.2)  +  0.3845  =  0.8459  (Father  of  H). 
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Updated  values  for  every  other  parent  and  child  of  H  are  indicated  in  parentheses  in  Figure 
9.  The  updated  belief  for  hypothesis  set  A  is  not  equal  to  one  due  to  rounding-off  errors. 

It  is  important  to  note  that  Pearl’s  algorithm  is  based  exclusively  on  the  simple 
concept  of  proportional  allocation.  In  the  above  example,  for  instance,  node  H  is  the  only 
one  that  can  be  formally  updated  by  Bayes’  theorem,  since  the  likelihood  matrix  merely 
specified  P(EJH)  andP(E,|H).  This  information  by  itself  does  not  permit  updating  of 

probabilities  for  the  children  or  parents  of  H,  even  though  it  is  known  that  these 
probabilities  must  also  have  changed.  To  alleviate  this  difficulty,  Pearl  suggests 
proportionally  allocating  P(Ej|H)  and  P(E,  |H)  to  the  a  priori  evidence  of  the  other  nodes, 

while  keeping  in  mind  that  each  node  of  the  tree  should  acquire  a.  belief  equal  to  the  sum 
of  the  beliefs  belonging  to  its  immediate  successors.  Thus,  for  example,  nodes  J  and  K  are 
both  updated  to  0.19225,  on  the  basis  that  their  a  priori  probabilities  were  equal  to  0.1, 
despite  the  fact  that  no  specific  information  is  available  to  determine  how  these 
probabilities  have  actually  been  affected  by  evidence  H. 

While  this  rule  is  reasonable,  it  is  clearly  conventional  and  may  not  always  lead  to 
an  appropriate  estimation  of  the  posterior  probabilities  associated  with  certain  nodes  of  a 
hierarchy.  On  the  other  hand,  adoption  of  such  a  rule  is  necessary  in  order  that  future 
evidence  on  node  H,  or  on  any  other  node,  could  again  be  incorporated  into  the 
probability  distribution  by  Bayes’  rule. 
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APPENDIX  B 

Examples  of  Evidential  Theory 
B.l  Example  of  Terminology  of  Evidential  Theory 


A  simple  numerical  example  will  help  clarify  the  wealth  of  terminology  associated 
with  the  Evidential  theory.  Let  0  =  {X,Y,Z}.  The  set  of  all  subsets  of  0  (2®)  contains  8 
elements,  namely  {X,Y,Z},{X,Y},{X,Z},{Y,Z},(X},{Y},{Z},0.  Let  us  assign  basic 
probability  numbers  to  each  subset  as  follows.  This  is  formally  the  same  as  assigning 
probabilities  to  the  preceding  set  of  eight  points,  ignoring  their  nature,  i.e.,  the  fact  that 
{X}  c  {X,  Y,  Z},  for  example. 


m({X,Y,Z})  =  0.1 
m({X,Y})  =  0.3 
m({X,Z})  =  0.0 
m({Y,Z})  =  0.3 


m({X»  =  0.2 
m({Y})  =  0.0 
m({Z})  =  0.1 
m(0)  =  0.0 


The  focal  elements  are  the  following:  {X,Y,Z},{X,Y},{Y,Z},{X},{Z}.  They  are  the  sets 
to  which  m  assigns  strictly  positive  mass.  The  degree  of  belief  Bel  for  each  subset  is 
obtained  as  follows  from  (4.2): 


Bel({X,Y,Z})=1.0 
Bel({X,Y})  =  0.5 
Bel({X,Z»  =  0.3 
Bel((Y,Z})  =  0.4 


Bel({X})  =  0.2 
Bel({Y»  =  0.0 
Bel({Z})  =  0.1 
Bel(0)  =  0.0 


Thus,  for  example,  Bel{Y,  Z}  =  m{Y,  Z }  +  m{Y}  +  m{Z}  +  m(0)  =  0.4. 

Clearly,  Bel  adheres  to  the  constraints  of  a  belief  function.  As  pointed  out  earlier,  it  is 
possible  to  retrieve  basic  probability  numbers  from  the  degrees  of  belief  of  each  subset: 

For  example, 

m({Y,Z)}  =  (-1)°  Bel({Y,Z})  +  (-1)1  Bel({Y»  +  (-1)1  Bel({Z})  +  (-1)2  Bel({  0  }) 

=  0.4 -0.0 -0.1  +0.0  =  0.3 


The  commonality  number  Q  for  each  subset  is  easily  obtained  from  (4.4): 
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Q((x»  =  0.6 

Q({Y»  =  0.7 
Q({Z})  =  0.5 
Q(0)  =1.0 


The  same  results  can  be  obtained  by  applying  (4.3).  Thus,  for  instance: 

Q({X,Y})  =  (-l)2Bel({X/Y))  +  (-1)1  Bel({X})  +  (-1)1  Bel({Y})  +  (-l)°Bel(0) 
=  0.1  -0.4 -0.3  +  1.  0  =  0.4 


Next,  using  (4.5),  the  degree  of  belief  of  each  subset  can  be  calculated  from  the 
commonality  number.  For  example, 

Bel({Y,Z)}  =  (-1)1  Q({X})  +  Q(0)  =  -0.6  +  1.0  =  0.4 

To  compute  the  degree  of  plausibility  of  each  subset,  (4.6)  is  used: 


P1({X,Y,Z})=1.0 
P1({X,Y})  =  0.9 
P1({X,Z})  =  1.0 
P1({Y,Z})  =  0.8 


P1({X})  =  0.6 
P1({Y})  =  0.7 
P1({Z})  =  0.5 
P1(0)  =  O.O 


The  evidential  interval  for  each  subset  is  then  as  follows: 


subset  (X,Y,Z)  :  [1.0,  1.0] 
subset  {X,Y}  :  [0.5,  0.9] 
subset  {X,Z}  :  [0.3,  1.0] 
subset  (Y,Z)  :  [0.4,  0.8] 


subset  (X)  :  [0.2,  0.5] 
subset  {Y}  :  [0.0,  0.7] 
subset  {Z}  :  [0.1,  0.5] 
subset  0  :  [0.0,  0.0] 


B.2  Proof  of  eq.  (4.12) 


Proof:  From  the  definition  of  A’s  commonality  number  (equation  4.4),  one  has 
Q(A)  =  2>(B), 
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where  m  =  nq  ©  m2 .  Replacing  m(B)  with  its  orthogonal  sum  (equation  4.9)  yields 
Q(A)=K£ 

B£0  i,  j 

AeB  BioCj^B 

=  K  X  m2CCj) 

ij 

AcBjnCj 

=  K  Z  m,(B1)m2(Cj) 

i.j 

ACB| 

AcCj 


f  \ 

f  "N 

=  K 

Em=(ci) 

i 

VAeBj  J 

IacCj  y 

f  y 

r  \ 

K 

Sm2(B) 

Be© 

vacb  J 

Be© 

Vaeb  J 

=  K  Q[(A)  Q2(A),  for  all  non-empty  Ac  0. 

This  completes  the  proof. 

B.3  Example  of  Dempster's  Rule  of  Combination 

Let  0  =  {X,Y,Z},  and  m,  and  m2  be  basic  probability  assignments  such  that: 

mi({X})  =  0.2  m2({Y})  =  0.4 

m1({X,Y})  =  0.4  m2({X,Y»  =  0.4 

m,  (0)  =  0.4  m2((X,Z})  =  0.1 

m2(©)  =  0.1 


and  m;(A)  =  0,  i  =  1,2  for  all  non  listed  subsets  of  2® . 
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Using  Dempster's  rule  of  combination,  we  proceed  as  follows  to  derive  a  new  basic 
probability  assignment: 

m  =  mj  ©  m2 


m2 


) 

^({Y}) 

.4 

< 

{} 

.08 

m 

.16 

m 

.16 

m2  ({X,Y}) 

{X} 

{X,Y} 

{X;,Y} 

.4 

.08 

.16 

.16 

m2({X,Z}) 

{X} 

{X} 

{X,Z} 

.1 

.02 

.04 

.04 

m  (0) 

(X) 

{X,Y} 

© 

.1 

.02 

.04 

.04 

_ ^ 

m,  ({X})  m,  ({X,Y})  m^®) 

.2  .4  .4 


After  Normalization  (1/K  =  0.92) 


Before  Normalization 


m({X})  =  0.16 
m({Y})  =  0.32 
m({Z})  =  0.0 
m({X,Y})  =  0.36 
m({X,Z})  =  0.04 
m({Y,Z})  =  0.0 
m(©  )  =  0.04 
m(0 )  =  0.08 


m({X})  =  0.17 
m({Y»  =  .35 
m({Z})  =  0.0 
m({X,Y»  =  0.39 
m({X,Z})  =  0.045 
m({Y,Z»  =  0.0 
m(0  )  =  0.045 
m(0)  =  O.O 


P1({X}>  =  .65 
P1({Y})  =  0.785 
P1({Z»  =  0.09 
P1({X,Y})  =1.0 
P1({X,Z}>  =  0.65 
P1({Y,Z>)  =  0.83 
Pl(©  )  =  1.0 
P1(0  )  =  0.0 


Identical  results  in  terms  of  the  degree  of  plausibility  are  obtained  from  the 
commonality  functions  using  (4.15).  The  commonality  numbers  calculated  from  (4.13) 
and  the  normalizing  constant  from  (4.14)  are  given  below: 
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Qi({X})=  1.0  Q2({X})=  0.6  1/K  =  0.92 

Qi((Y})  =  0.8  Q2({Y})=  0.9 

Q1({Z})=  0.4  Q2((Z})=  0.2 

Q1({X,Y})=  0.8  Q2({X,Y})  =  0.5 

Qi({X,Z})  =  0.4  Q2({X,Z})=  0.2 

Qi((Y,Z})  =  0.4  Q2({Y,Z})  =  0.1 

Q1(0)=  0.4  Q2(0)  =  0.1 

0,(0)=  1.0  Q2(0)=  1.0 

B.4  Voorbraak’s  Example 

In  the  example,  the  body  of  evidence  Y  is  implied  by  X;  as  a  consequence,  evidence  Y  is 
already  taken  into  account  in  the  basic  probability  assignment  mx.  The  example  is 

reproduced  below. 

Let  0  be  the  frame  of  discernment  (A,  A  },  where  A  denotes  the 
proposition  "patient  P  has  the  flu".  Suppose  that  X  represents  the 
observation  that  P  has  a  fever  >  39°C,  that  Y  represents  the 
observation  that  P  has  a  fever  >  38.5°C  and  that  the  basic  probability 
assignments  of  X  and  Y  are: 

mx(A)  =  6  mx(0)  =  4 

mY(A)  =  4  mY  (0)=.6 

According  to  Dempster’s  combination  rule,  Belx(A)  ©  Bel  Y  (A)  =  0.76 . 

However,  since  Y  is  implied  by  X,  we  would  assume  that 
Belx(A)®  Bely(A)  =  Belx(A)  =  .6.  This  is  obviously  not  the  case; 
therefore  Belx(A)©  BelY(A)  is  unacceptable. 

It  is  important  to  note  that  Voorbraak's  example  is  based  on  the  principle  that  an 
observation  is  received  and  inference  is  applied  to  the  observation  to  obtain  a  conclusion: 

observation  =>  inference  =>  conclusion. 


Using  a  different  terminology,  we  have: 
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evidence  =>  inference  =>  hypothesis  . 


This  is  clearly  depicted  by  Voorbraak's  example  reproduced  above: 


example 


X:  P  has  fever  >  39°  C  =>  inference  =>  A:  patient  P  has  flu 
Y:  P  has  fever  >  38.5°  C  =>  inference  =>  A:  patient  P  has  flu 


As  mentioned  earlier,  Belx(A)  ®  BelY(A)  is  unacceptable  in  this  context. 

However,  modifying  Voorbraak's  example  by  eliminating  the  inference  process 
simplifies  the  independence  concept,  since  the  type  of  evidence  has  changed  and  the 
relationship  between  the  evidence  and  inference  process  is  eliminated.  If  we  assume  that 
doctors  C  and  D  provide  independent  diagnoses,  then  we  could  say  that  evidences  X  and 
Y  are  independent  under  this  modified  structure: 

fX:  doctor  C  diagnoses  flu  =>  A:  patient  P  has  flu 
modified  example  <  . 

[Y:  doctor  D  diagnoses  flu  =>  A:  patient  P  has  flu 

Here  Belx(A)©  BelY(A)  is  acceptable  and  the  evidences  seem  independent  according  to 
Shafer  since,  when  viewed  abstractly,  the  information  originates  from  two  assumed 
independent  diagnoses. 

B.5  Example  of  Dempster's  Rule  with  Too  Coarse  a  Frame  of  Discernment 

Let  ©  =  {a,  b,  c,  d}  and  Q  =  { <y , ,  co2  },  where  yr.2a  — >  2®  is  the  refining  given 
by  y/  ({«,})  =  {a}  and  y/({a)2})  =  {b,  c,  d}.  The  set  Q  is  then  a  coarsening  of  0. 
Assume  that  the  first  body  of  evidence  produced  a  simple  support  function  S!  over  0 
focused  on  A  =  {a,  b},  whereas  the  second  body  of  evidence  produced  a  simple  support 
function  S2  over  ©  focused  on  B  =  {a,  c}.  Neither  S,  nor  S2  record  any  support  for 
either  {a}  or  {b,  c,  d}.  Therefore,  SJ2n,  S2|2n  and  (S,|2n) ®  (S2|2°)  are  all  vacuous. 
However,  ((S,|2®)©(S2|20))|2n  is  not  vacuous,  since  it  provides  the  degree  of  support 
S,  (A)  S2  (B)  >  0  for  A  n  B  =  {a} .  Thus  ((S1|2®)©(S2|20))|2n  *  ((S1|2n)©(S2|2n))|2£i. 

In  our  study  of  identity  declaration  fusion,  this  difficulty  can  be  easily  alleviated 
by  choosing  the  frame  of  discernment  fine  enough  to  discern  all  relevant  interaction  of 
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evidence  to  be  combined.  This  will  be  easily  attainable  due  to  the  nature  of  the  evidence 
(identity  declaration)  and  because  the  evidence  will  be  structured  in  a  strict  hierarchy. 

B.6  Proof  of  Equation  (4.16) 

Proof: 

a.  For  any  A,  one  has  nA  <  n ,  and  hence 

n  a 

Now  since  m(A)  =  1,  it  follows  that  Pm  >  1  /  n.  Also,  nA  >  1  for 
any  A  &  0,  and  hence 

Pm  -  S  m(A)  - 1  • 

A  cX 

b.  If  m  is  vacuous,  m(X)=l  and  hence  Pm  =l/n. 

If  m  is  not  vacuous,  then  there  exists  some  A  such  that  m(A)  >  0  and  nA  <  n ; 
therefore 


c.  Assume  that  m  is  Bayesian.  Then  the  sets  having  m(A)  >  0  are  only  the 
singletons.  Thus 

pm  =  2>({Xi»<l. 

i=l 

Assume  that  m  is  not  Bayesian.  Then  there  exists  some  A  such  that  m(A)  >  0  and 
nA  >  1,  whence  Pm  <  1. 

This  completes  the  proof. 

As  an  example,  let  0  =  (W,  X,  Y,  Z},  and  m,  and  m2  be  basic  probability  assignments 
defined  as  follows: 

m,  ({X,Y})  =  0.4  m2  ({W,  X,  Y})  =  0.4 

m1({Z})  =  0.2  m2  ({Z})  =  0.2 

m,  (0)  =  0.4  m2  (0)  =  0.4 

Again,  m;(A)=  0,  i  =  1,2  if  A  is  not  listed  above. 

Therefore, 

p  =  0.4/2  +  0.2/1  =  0.4  Pm  =  0.4/3  +  0.2/1  -  0.33 

itij  m2 
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B.7  Proofs  of  the  Entropy  Measure  Properties 


Proof: 

a.  Since  for  a  Bayesian  belief  function  m(A)  =  0  for  all  non-singletons, 

Em  =  2m((x})Con(Bel,BelA). 

xcX 


Let  gx  denote  the  basic  assignment  function  associated  with  the  certain  support 
function  at  {x}.  Then 

gx({x})  =  l, 

gx  (B)  =  0  for  all  other  B  c:  X,  and 

Con(Bel,Bel{x})  =  -ln(l-k)  where  k=  ^m(A;)  gx(Bj ). 

tj 

for  A|nBj=0 

Since  gx(B)  =  0  for  B  &  {x}  and  is  equal  to  1  elsewhere, 

k  -  Zm(Ai)- 

i 

for  Ajn{x}=0 

Since  m  is  Bayesian, 

k  =  X  m({xi ))  =  2>({Xi })  =  1  ~  m{(x}). 

i  i 

{xj  }n{x}=0  for  Xj  *x 


Thus, 

Con(Bel,BeI{x})  =  —  ln(l  — [1  —  m({x})])  =  -ln(m[{x})], 
and  hence 

Em  =-Jm({x})  ln[m({x})]. 

xeX 

b.  Since  P1(A)  e[0,l]  for  all  A  <z  X ,  one  has  ln(Pl(A))<  0.  Furthermore,  since 
m(A)  e  [0,1],  it  must  be  that 
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Em=-Em(A)ln(Pl(A))>0. 

AcX 


Let  us  introduce  n  focal  elements  with  values  m(A;)  =  a, .  We  have 


where 


Em=-2>(A,)In[Pl(A,)] 

i=l 


Pl(Aj)  = 


Zm(Aj)+  Sm(Aj) 

i  aj 

for  AjriAj^0  forAinAj*0 

i=j 

m(Ai)  +  di  =  a;  +  d;. 


Therefore,  one  has 


Em  =-2a»  ln(ai  +di>- 

i=l 


As  di  increases,  ln^+d;)  increases  and  -  ln(aj  +  d;)  decreases. 
Consequently,  Em  is  maximal  when  d,  =  0  for  all  i.  This  occurs  when  all  the  A; 
are  disjoint.  If  we  assume  n  disjoint  focal  elements  with  m(A; )  =  1  /  n ,  we  obtain 
a  maximal  Em ,  namely 


Em  =-£l/n  ln(l / n)  =  ln(n). 

i=i 


c.  From  the  definition  of  Em ,  Em  =  0  if  there  is  an  A  such  that  m(A)  ^  0,  which 
requires  in  turn  that  ln[Pl(A)]  =  0  and  P1(A)=1 .  Since 

P1(A)=  £m(B), 

B 

BnA*0 

this  means  that  every  pair  of  focal  elements  must  have  at  least  one  element  in 
common. 

d.  Em  =  ln(n)  iff  m(  A; )  =  1/n  for  i=l,  2,...,n  was  proved  in  b.  above. 

This  completes  the  proof. 
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Shannon's  entropy  measures  the  discordance  associated  with  a  probability 
distribution  (Yager,  1983).  As  an  example,  let  0  =  (W,  X,  Y,  Z },  and  m,  and  m2  be 

basic  probability  assignments  defined  as  follows: 

m,  ({W})  =  0.25  m2  ({W})  =  0.5 

m,  ({X})  =  0.25  m2  ({W,  X})  =  0.25 

m,  ({Y})  =  0.25  m2  ({Z})  =  0.25 

m,  ({Z})  =  0.25 

Then, 

Em[  =  -  [0.25.  ln(0.25)  + 0.25.  ln(0.25)  + 0.25.  ln(0.25)  + 0.25.  In0(.25)j  =1.386 
Em2  =  -  [0.5.  ln(0.75)  + 0.25.  ln(0.75)  + 0.25.  ln(0.25)]=  0.562 

Emj  =  -  [0.5.  ln(0.75)  + 0.25.  ln(0.75)  + 0.25.  ln(0.25)]=  0.562 
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APPENDIX  C 

The  Shafer  and  Logan  Algorithm 


C.1  Formulas  for  the  Shafer  and  Logan  Algorithm 

The  formulas  for  the  Shafer  and  Logan  algorithm  are  given  below.  For  each  node  A  in 
91,  let 


Aq  =  BelA(A) 
At=Bdt(A) 

A+  =(BelA0Beli)(A) 
A*  =  (BelA  ©BelA)(A) 
A®  =  Belg(A) 


A'  =BelA(A) 

A^  =  BelA  (A) 

A”  =  (Bel A  0  Bel A  )(A) 
A-=(BelA0Bell)(A) 
A@  =  Belg(A) 


Stage  1 

Calculate  At  and  A^  from  B+  and  B~  for  B  in  lA  : 
At  =1-K, 

AJ  =  K.  nB-/(l-B+) 

B^a 

where  1/K  =  1+  ^B+/(l-B+). 

Be/A 

Calculate  A+  and  A"  from  At ,  Af ,  At  and  A^: 

A+  =  1  -  K(1  -  At  )(1  -  At ) , 

A-  =  1  -  K(1  -  At  )(1  —  A  J  ) , 
where  1  /  K  =  1  -  At  •  A^  -  At  •  At . 


Stage  2 

Calculate  Ag  and  Aq  for  A  in  £@  from  A+  and  A-  for  A  in  £e  : 

a;  =  i-k(i+  2b*  /(i-b*»-  it  b-  /(i-b*), 

Bt7  Be!0 

b*!9 

A©  =  1  -  K(1  -  A”  )  /  (1  -  A+ ) , 
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Stage  3 

Calculate  and  A^  from  A@,  A@,  A and  A 

a;=i-k(i-a^)/(i-a:x 

Ao  =  1-K(1- Ag)/  (1- AJ), 


1  -  A+  1  _  A“ 
where  1/K  =  — ^-  +  — 

1-A;  1-Al 


1  Aq  A  q 

1-aj-ai  ' 


Calculate  BA ,  BA ,  and  BA  from  C+  and  C  for  C  in  iA  : 
B;=1-K(1+2C+/(1-C+))s 


Ba  =  1  —  K(1  -  B_)  /  (1  -  B+  ) , 


where  1/K  =  1+  £C+/(1-C+). 

Ce^A 


Calculate 

Bg  andBg  from  A^,  A^,  A£,  A^,  BA,  BA  and  BA  where  B  is  a  daughter  of  A: 

b^  =  k(a;(b1-a^)+(i-a;-a-)b;), 

B- =1-K(1-A-)(1-B-), 
where  1  /  K  =  1  -  A^  •  A^  -  A^  •  A£ . 


C.2  Example  of  Dempster's  Combination  Rule  with  Bayesian  Belief  Functions 

This  example  uses  the  strict  hierarchical  tree  illustrated  in  Figure  9  of  Subsection 
3.6.3.  The  a  priori  probabilities  are  indicated  for  each  set  of  interest.  Information 
received  concerns  hypothesis  H: 


P(Ej|H)  =  0.5  and  P(Ej|H)  =  0.2 
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To  be  compatible  with  the  input  of  Dempster's  combination  rule,  this  information  must  be 
transformed  as  follows: 

P(TT1F1:_  P(Ei|H) 

^  x)  PCEJ^  +  PCEJH) 

such  that  P(H|E!)  =  0.714  and  P(H|Ej)  =  0.286.  We  therefore  obtain  the  following 
basic  probability  assignments: 

mi({H})  =  ({K,  N,  O,  P})=  0.714, 
m,({H})  =  ({C,  D,  I,  L,  M»  =  0.286, 
m,(0)  =  ({C,  D,  I,  K,  L,  M,  N,  O,  P})  =  0.0, 

which  represent  a  simple  support  function  focused  on  a  subset  of  0  and  its  complement, 
with  no  uncertainty.  This  basic  probability  assignment  is  then  combined  to  the  a  priori 
probability  of  hypothesis  H  (0.2)  using  Dempster's  combination  rule:  m  =  m,  ©  m2 : 

m^H})  =  m,  ({K,  N,  O,  P})  =  0.7143 
m^H})  =  mj({C,  D,  I,  L,  M})  =  0.2857 
111,(0)  =  0.0 

m2({H})  =  m2({K,  N,  O,  P})  =  0.2 
m2((G})  =  m2((L,  M})  =  06 
m2({I})  =  0.1 

m2({B})  =  m2({C,  D})  =  0.2 


Before  Normalization  After  Normalization  (K1  =  0.37142) 


m({K,  N,  O,  P})  =  0.14286 
m({L,  M})  =  0.14285 
m({I})  =  0.02857 
m({C,  D})  =  0.05714 
m(0)  =  0.62858 


m({K,  N,  O,  P})  =  0.3845 
m({L,  M})  =  0.3845 
m({I})  =  0.0769 
m({C,  D})  =  0.1538 
m(0)  =  0.0 
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C.3  Example  of  Shafer  &  Logan  Algorithm 

This  example,  in  which  6  sets  of  evidence  are  combined,  illustrates  the  propagation 
effect  of  the  Shafer  &  Logan  algorithm.  The  same  strict  hierarchical  tree  as  above  is  used. 
The  example  is  composed  of  6  steps,  at  which  additional  evidence  is  received  for  a  specific 
node  in  the  form  of  a  simple  support  function  or  dichotomous  function,  and  then 
combined.  The  new  belief  (Bel)  and  plausibility  (PI)  values  are  calculated  for  each  node. 
As  a  reminder,  for  each  node  A  in  the  tree: 

Bel(A)  =  Bela  (A) 

=  Aq  (according  to  Annex  B), 

P1(A)  =  1  -  Bel(A)  =  l-Bel^(A) 

=  1-Aq  (according  to  Annex  B). 

The  evidence  to  be  combined  is  as  follows: 

step  1 :  m({N})  =  .5  m(0)  =  .5 

step2:  m({H})  -  .5  m(@)  =  .5 

step3:  m({F})  =  .9  m(0)  =  .1 

step4:  m((F})  =  .0  m((F»  =  .5  m(0)  =  .5 

step5 :  m({B})  =  .99  m({F»  =  .01  m(0)  =  .0 

step6:  m({B})  =  .05  m(0)  =  .95 

At  step  0,  all  the  belief  and  plausibility  values  of  each  node  are  zero.  Figures  13  to  18 
show  the  results  of  the  Shafer-Logan  algorithm  after  adding  evidence  from  step  1  to  6 
respectively. 

To  demonstrate  the  use  of  the  various  formulas  of  the  Shafer-Logan  algorithm 
given  in  Section  C.l,  calculations  are  shown  below  for  step  1. 


Stage  0 

m({N})  =  Bel({N})  =  .5  and  m(0)  =  .5 
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Stage  1 

Let  A  =  node  J;  then 

1/K=  1  +  .5/(1-. 5)  +  0/1  +  0/1  =  2;  K  =  .5 
Jl  =  1-K  =  .5 

Jl  =  .5  (0/  (1-.5)  x0/lx0/l)  =  0 

1/K  =  l-  0x0-0x.5  =  l;K=l 
J+  =1-1  (l-0)(l-.5)  =  .5 

J'  =1-1  (l-0)(l-0)  =  0 

Let  A  =  node  H;  then 

1/K  =1  +  .5/(1-. 5)  +  0/1  =  2;  K  =  .5 
=1-K=  .5 

H;  =  .5  (0  /  (1-.5)  x  0  / 1)  =  0 

1/K  =  l-  0x0-0x.5  =  l;K=l 
H+  =1-1  (l-0)(l-.5)  =  .5 
H'  =1-1  (l-0)(l-0)  =  0 


Let  A  =  node  F;  then 

1/K  =  1  +  0/1  +  .5/(l-.5)  +  0/1  =  2;  K  =  .5 
F;=l-K=  .5 

F^  =  .5  (0  / 1  x  0  /  (1-.5)  x  0  / 1)  =  0 

1/K  =  1  -  0x0-0x.5  =  l;K=l 
F+  =1-1  (l-0)(l-.5)=  .5 

F‘  =1-1  (l-0)(l-0)  =  0 

Let  A  =  node  G;  then 

Gl  =G\  =G+  =G'  =0 


Let  A  =  node  B;  then 
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Bj  =B\  =  B+  =B‘  =0 


Stage  2 

1/K  =  1  +  (0/1  +  .5/(1-. 5))  -  (0/1  x  0/.5)  =  2;  K  =  .5 
Be  =  1  -  .5(1  +  .5/(1-. 5))  -0  =  0 

B9  =1  -  .5  ( 1  -  0)  /  ( 1  -  0)  =  .5 
Fg  =  1  -  .5  (1  +  0)  -  0  =  .5 
Fe  =1  -  .5(1  -  0)/(l  -  .5)  =  0 


Stage  3 

Let  A  =  node  B;  then 

1/K  =  (1  -  0)/(l  -  0)  +  (1  -  ,5)/(l  -  0)  -  (1-  0  -  ,5)/(l  -  0  -  0)  =  1 

b;  =  i  - 1(1  -  o)  /  (i  -  o)  =  o 

Bo  =  1  -  1(1  -  .5)  /  (1  -  0)  =  .5 

1/K=1  +0/1  +  0/1  =  1;K=  1 

c;  =  1  -  1(1  +  0/(1  -  0))  =  0 

CB  =  1  -  1(1  -  0)/(l  -  0)  =  0 

C*  =  1  -  1(1  +  0/1  -  0/1)  =  0 
and  in  a  similar  fashion, 

d;=db=d;=o 

1/K  =  l  +  0x.5-0x0  =  l 

Ce  =  1  (Ox  (0  -  0)  +  (1  -  0  -  .5) x  0  =  0 

Cg  =  1  -  1(1  -  .5)0  -  0)  =  .5 
and  in  a  similar  fashion, 

Dg  =  0  and  D9  =5 


Let  A  =  node  F;  then 
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1/K  =  (1  -  ,5)/(l  -  .5)  +  (1  -  0)/(l  -  0)  -  (1-  .5  -  0)/(l  -  .5  -  0)  =  1 
F0+  =  1  -  1(1  -  .5)  /  (1  -  .5)  =  0 

F'  =  1  -  1(1  -  0)/(l  -  0)  =  0 

1/K  =  1  +  0/(1 -0)  +  .5/(1-. 5)  +  0/(1 -0)  =  2;  K  =  .5 

Gp  =  1  -  .5(1  +  .5/(1  -  .5)  +  0/(1  -  0))  =  0 

Gp  =  1  -  .5(1  -  0)/(l  -  0)  =  .5 

G;  =  1  -  .5(1  +  .5/(1  -  .5)  +  0/(1  -  0)  -0)  =  0 

Hp  =  1  -  .5(1  +  0/(1  -  0)  +  0/(1  -  0))  =  .5 

Hp  =  1  -  .5(1  -  0)/(l  -  .5)  =  0 

H;  =  1  -  .5(1  +  0/(1  -  0)+  0/(1  -  0)  -0)  =  .5 

i;  =  1  -  .5(1  +  0/(1  -  0)+  .5/(1  -  .5))  =  0 

Ip  =  1  -  .5(1  -  0)/(l  -  0)  =  .5 

I*  =  1  -  .5(1  +  0/(1  -  0)+  .5/(1  -  .5)  -0)  =  0 

1/K=1 +  .5x0 -0x0  =  1 

G;  =  1  (0  x  (0  -  0)  +  (1  -  0  -  0)  X  0)  =  0 

G0  =  1  -  1(1  -  0)(1  -  .5)  =  .5 

H;=  1  (0  x  (.5  -  0)  +  (1  -  0  -  0)  x  .5)  =  .5 

He  =  1  -  1(1  -  0)(1  -  0)  =  0 

Ie  =  1  (0  x  (0  -  0)  +  (1  -  0  -  0)  x  0)  =  0 

I®  =  1  -  1(1  -  0)(1  -  .5)  =  .5 

We  have  thus  obtained  Bel(B),  Bel(C),  Bel(D),  Bel(F),  Bel(G),  Bel(H),  Bel(I),  where 

Bel(B)  =  Belg(B)  =  Bq  =0and 
P1(B)  =  1  -  Be^  (B)  =  1  -  Be  =  1-.5  =  5  . 


The  belief  and  plausibility  of  the  other  nodes  can  be  obtained  in  a  similar  fashion  using 
stage  3  of  the  algorithm. 
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